How to make machine learning scoring functions competitive with FEP

A constant challenge in machine learning is having models with limited generalization beyond their training data. This preprint introduces a GNN-based approach that encodes protein-ligand interactions through ‘atomic environment vectors’ that summarize the local chemical environment of a reference atom. Evaluating on a new out-of-distribution test set and two alternative benchmark systems used for free energy perturbation (FEP) calculations, they find strategies for improving ML scoring functions.

https://chemrxiv.org/engage/chemrxiv/article-details/6675a38d5101a2ffa8274f62