Pairing interacting protein sequences using masked language modeling

  • MSA Transformer can fill in masked amino acids in multiple sequence alignments (MSAs) using the surrounding context.

  • This study suggests that this ability allows MSA Transformer to encode coevolution between functionally or structurally coupled amino acids within and across protein chains. They introduce a method, DiffPALM, that exploits these properties of MSA Transformer to generate paired alignments for paralogs (genes that arise from the duplication event and whose proteins have overlapping or redundant functions).

  • Feeding these paired alignments into AlphaFold-Multimer substantially improves structure prediction for some complexes, they say.

2