Co-Author(s)

ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries

Drug discovery is really difficult. Finding a molecule that treats a disease is already a hard problem, but a successful drug can’t just treat the disease. The molecule also has to be absorbed by the body, find its way to the diseased area, exert its effect, and then exit the body all without causing any major damage along the way. Measuring all of these necessary druglike properties in the lab is really slow and expensive, so it would be fantastic if we could predict them computationally.

We built ADMET-AI to solve this problem. ADMET-AI is a machine learning platform that predicts the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of a molecule, which are many of the properties needed in a safe and effective drug. We wanted to make ADMET-AI as accessible as possible to the research community, so we made a free website, wrote a paper, and made the code open source as a pip-installable Python package. Please give it a try, and read on for more details!

Developing ADMET-AI

In the past, we’ve had success using graph neural networks (GNNs) to predict all sorts of properties of molecules, from quantum mechanical energies to human toxicity. So for ADMET-AI, we decided to use the same model architecture, called Chemprop-RDKit, to predict ADMET properties.

How Chemprop-RDKit works

Chemprop-RDKit (Figure 1) contains a GNN called Chemprop that computes simple features for each atom (e.g., atom type) and each bond (e.g., bond type) and then runs several steps of message passing with neural network layers to aggregate atom and bond information across the molecule to build a single representation of the whole molecule. This representation is then concatenated with 200 molecular features computed by RDKit, and the combined representation is passed through several feed-forward neural network layers to predict molecular properties. With this hybrid architecture, Chemprop-RDKit combines the strength of GNNs (understanding local neighborhoods of the molecule) with the power of RDKit (computing features that describe the entire molecule).

Figure 1. ADMET-AI uses machine learning models with the Chemprop-RDKit architecture. Chemprop-RDKit employs both a graph neural network (top) and 200 physicochemical properties computed by RDKit (bottom), which are combined by a feed-forward neural network (right) to predict the properties of a molecule.

Training on benchmark ADMET data

We trained ADMET-AI’s Chemprop-RDKit models using ADMET data from the Therapeutics Data Commons (TDC), which is a great resource of publicly available datasets for drug discovery. The TDC has a leaderboard called the ADMET Benchmark Group with 22 ADMET datasets, and this leaderboard is perfect for comparing different machine learning models for ADMET prediction.

Since we wanted to build the most comprehensive ADMET prediction model possible, we also used an additional 19 ADMET datasets that are in the TDC but are not included in the leaderboard for a total of 41 ADMET datasets (10 regression, 31 classification). We also built two multi-task ADMET datasets, one with all 10 regression datasets and one with all 31 classification datasets, so that we could train multi-task models since they’re often faster and more accurate than training many single-task models.

ADMET-AI outperforms competitors

We first tested ADMET-AI on the 22 datasets from the TDC ADMET leaderboard. We were excited to see that ADMET-AI had the best average rank compared to all other models (Figure 2). While other models are often accurate on only a few ADMET properties, ADMET-AI is the best on average across all the ADMET properties, so it’s ideal for predicting the entire spectrum of ADMET properties at once.

Figure 2. The average rank of machine learning models evaluated on the 22 datasets in the TDC ADMET leaderboard. ADMET-AI has the best average rank compared to all other models evaluated across these datasets (error bars show standard error across datasets).

We then tested ADMET-AI on the full set of 41 TDC ADMET datasets. We saw that ADMET-AI worked well both as 41 single task models (one per dataset) and as two multi-task models (one for all regression datasets, one for all classification datasets). Since running two multi-task models is much faster than running 41 single-task models, we decided to use the multi-task models in our website and Python package.

Building the ADMET-AI Website

We wanted to make fast and accurate ADMET prediction easily accessible to researchers regardless of computational background, so we built the user-friendly ADMET-AI website. If you visit the website, you can input up to 1,000 molecules at a time by entering SMILES, uploading a CSV file, or drawing molecules with an interactive tool (Figure 3). You can then make predictions and view them on the website or download them as a CSV. The website displays both a radar plot summary of five key druglike properties (Figure 4) and prediction values for all 41 ADMET properties (Figure 5).

Figure 3. On the ADMET-AI website, users can input up to 1,000 molecules at a time by entering SMILES, uploading a CSV file, or drawing molecules with an interactive tool.
Figure 4. ADMET-AI displays a radar plot summarizing five key druglike properties of the molecule along with a visualization of the molecule’s structure.
Figure 5. ADMET-AI shows computed values for eight physicochemical endpoints along with predicted values for 41 ADMET properties (“Value” column). ADMET-AI also computes the percentile of each prediction compared to 2,579 approved drugs in the DrugBank (“DrugBank Percentile” column). The percentile provides valuable context for the molecule compared to approved drugs. (Note: Some ADMET properties are not displayed in this figure to save space.)

There are several other publicly available ADMET prediction websites, such as SwissADME and ADMETLab 2.0, but ADMET-AI stands out with three key differences: (1) contextual predictions, (2) local predictions, and (3) speed.

Contextualizing ADMET Predictions

ADMET predictions can be difficult to interpret in isolation because different categories of drugs have different requirements on their molecular properties. For example, high toxicity is acceptable as a side effect of chemotherapies since there are often no better alternatives, but high toxicity is generally unacceptable for drugs like antibiotics since toxicity can outweigh the benefits of the drug, especially compared to safe alternatives.

We thought that we could provide context for ADMET-AI’s predictions by comparing predictions on new molecules to predictions on known drugs. To do this, we created a reference set of 2,579 drugs from the DrugBank that have obtained regulatory approval. We then applied ADMET-AI to make ADMET predictions on all of these molecules. Now, when ADMET-AI makes predictions for a new molecule, it also computes the percentile of the ADMET predictions with respect to the 2,579 reference approved drugs (Figure 5). For example, if a molecule receives a toxicity percentile of 90%, it means that the molecule is more toxic than 90% of approved drugs.

We wanted to provide users with even more fine-grained context, so we also added an option to let users pick an Anatomical Therapeutic Chemical (ATC) code, such as “antibiotics for topical use,” to create a more refined reference set for computing percentiles (Figure 6). If you select an ATC code, you’ll then see your proposed molecule compared to only approved drugs labeled with that ATC code, so you can see if your molecule shares the same range of ADMET properties as approved drugs for the same disease.

Figure 6. ADMET-AI users can select an anatomical therapeutic chemical code to define a subset of the DrugBank reference set of 2,579 approved compounds to better contextualize the ADMET predictions for their molecules. Here, “antibiotics” is selected to create a reference set of 40 compounds.

To make it easier to visualize this information, ADMET-AI also creates a scatterplot that compares the input molecules to the reference DrugBank molecules based on two user-selected ADMET properties (Figure 7).

Figure 7. Scatterplots produced by ADMET-AI showing the toxicity and absorption of an input molecule (red star) compared to the DrugBank reference set of approved drugs filtered to either antineoplastic agents (left) or antibiotics (right). The context provided by the reference sets shows that the input molecule would have acceptable toxicity for an antineoplastic agent but not for an antibiotic.

Local Predictions

We made the ADMET-AI website easy to use, but for more computationally oriented users, ADMET-AI is also available as a Python package for making predictions locally. The ADMET-AI package contains the same models as the website but can be run on faster hardware (e.g., a GPU and more CPUs) with no limit on the number of molecules per prediction. You can install ADMET-AI with pip install admet-ai, and then you can make ADMET predictions using the admet_predict command-line tool or by importing the ADMETModel class in Python.

# Installation
pip install admet-ai

# Command line interface
admet_predict --data_path /path/to/smiles.csv
# Python module
from admet_ai import ADMETModel

# Load ADMET-AI model
model = ADMETModel()

# Predict on one molecule => dictionary
admet_predictions = model.predict(smiles="CCO")
"""
{"hERG": 0.01, ..., "ClinTox": 4.79e-04}
"""

# Predict on multiple molecules => Pandas DataFrame
admet_predictions = model.predict(smiles=["CCO", "CC"])
"""
      hERG  ...   ClinTox
CCO   0.01  ...  4.79e-04
CC    0.11  ...  5.40e-06
"""

Speed

One big benefit of ADMET-AI is that it’s much faster than other publicly available ADMET prediction tools. ADMET-AI can make predictions on 1,000 molecules in just 1 minute and 41 seconds compared to over 3 minutes for the next fastest model (vNN-ADMET) and 1 hour and 26 minutes for SwissADME (Figure 9).

Users can also run ADMET-AI locally with more powerful hardware for even greater speed. On a local machine with 32 CPU cores and a GPU, ADMET-AI can make predictions on one million molecules in just 3.1 hours. This means it’s realistic to make ADMET predictions on large-scale chemical libraries for drug screening using ADMET-AI.

Figure 9. The time for ADMET-AI and other publicly available ADMET websites to make predictions on 1, 10, 100, or 1,000 molecules. ADMET-AI is the fastest website for any number of molecules.

Conclusion

ADMET-AI is a simple, fast, and accurate platform for ADMET prediction with both an easy-to-use website and a powerful Python package. ADMET-AI is the most accurate model on average across the TDC ADMET leaderboard and is significantly faster than all other publicly available ADMET websites. It also uniquely provides context for ADMET predictions by comparing the ADMET properties of new molecules to those of approved drugs in relevant therapeutic categories. As a fully free and open-source platform, ADMET-AI can be a powerful drug discovery tool for identifying compounds with favorable ADMET properties for further development. We hope you’ll try ADMET-AI, and please feel free to reach out with any questions or feedback!

6
1 reply