AlphaFold2 uses multiple sequence alignments (MSAs) for structure prediction, whereas models like ESMFold use protein language modules instead. This paper does something between the two: make multiple MSAs based on a huge dataset and select the "best".
They combine several MSA generation pipelines to create multiple MSAs based on huge genomics and metagenomics sequence databases, and uses a deep learning MSA scoring strategy to select the "optimal" MSA. They claim substantial improvements over AlphaFold2 and AlphaFold-Multimer, especially on targets those other models found more difficult.