Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data

AlphaFold2 uses multiple sequence alignments (MSAs) for structure prediction, whereas models like ESMFold use protein language modules instead. This paper does something between the two: make multiple MSAs based on a huge dataset and select the "best".

They combine several MSA generation pipelines to create multiple MSAs based on huge genomics and metagenomics sequence databases, and uses a deep learning MSA scoring strategy to select the "optimal" MSA. They claim substantial improvements over AlphaFold2 and AlphaFold-Multimer, especially on targets those other models found more difficult.