A multitask neural network trained on embeddings from ESMFold can accurately rank order clinical outcomes for different cystic fibrosis mutations

ESMFold is basically a large language model trained on protein sequences, with the goal of predicting 3D protein structure. This study explores whether ESMFold embeddings tcan be used to predict clinical outcomes for cystic fibrosis (CF) genotypes. Training a neural net on ESMFold embeddings produces statistically-significant correlations on CF-related phenotypes.

2