Accurate proteome-wide missense variant effect prediction with AlphaMissense

Mutations (changes) in a protein sequence can be benign (no real effect) or pathogenic (disrupting function, bad effects). Classifying these effects is an important ongoing challenge: only ~2% of mutations have been clinically classified as benign or pathogenic, and the rest are unknown.

A group from DeepMind used AlphaFold2 as a basis for predicting the effects of single amino acid substitutions, training on population frequency data, sequence and predicted structural context and throwing in an unsupervised protein language model. AlphaMissense could be a useful resource for understanding the molecular effects of variants on protein function, and even help clinicians prioritizing rare disease diagnostics.

Full abstract: The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.

2