Introducing Medchem: a Python library for molecular prioritization and filtering

Hadrien Mary
Β Β·Β Scientist - Lead of the Research Translation Unit @ Valence Labs

TLDR: Medchem is now open-source! Get started at https://github.com/datamol-io/medchem and follow along at @datamol_io.

Medchem is a Python library for molecular filtering and prioritization. It contains hundreds of well-known and also novel molecular filters, alerts and rules to help you efficiently triage and prioritize a large list of compounds at scale.

β€”

Molecular prioritization and filtering

In the world of drug discovery, it's not uncommon to have a plethora of potential drug candidates generated or predicted using in silico or in cerebro methods, especially with the advent of machine learning techniques. While these advancements are groundbreaking, the sheer volume of candidates can be overwhelming. This is where a tool equipped with medicinal chemistry (or medchem in short) filters, alerts, and rules becomes invaluable. It not only ensures that these candidates are safe and effective but also helps prioritize them, ensuring that the most promising ones get the attention they deserve. However, it's crucial to understand that rules and filters are always context-specific. Blindly applying them without considering the unique context of each drug discovery project can lead to missed opportunities or false negatives. Moreover, despite the power of machine learning, the wisdom encapsulated in expert rules remains indispensable. These rules, honed over years of research and experience, complement machine learning tools, providing a balanced and comprehensive approach to drug candidate evaluation. In essence, combining the precision of machine learning with the insights of expert rules ensures that we're not just finding more candidates, but better ones.

Introducing Medchem

Meet "Medchem", a Python library for molecular filtering and prioritization.

At Valence Labs, we have been developing Medchem for a few years now, and it became a cornerstone in addition to our machine learning drug discovery pipelines to generate and prioritize drug candidates automatically and at scale.

Medchem unifies existing, and well-known medicinal chemistry alerts, rules and filters (Ro5, Lilly rules, etc), and also brings novel filters. Generative filters can triage compounds generated with deep learning molecular models, and complexity filters act as a proxy for synthetic feasibility.

The main features of Medchem are:

- πŸ“ˆ Built for Scale: With built-in parallelization, Medchem is designed to handle large-scale filtering tasks with ease, ensuring that no molecule is left unexamined.

- πŸ€— User-Friendly: Its high-level interface ensures ease of use, making it perfect for quick prototyping and iterative development.

- 🏭 Flexible Integration: For those looking to integrate Medchem into custom or complex production pipelines, the library offers a low-level API, ensuring seamless integration without compromising on functionality.

- πŸ‘©β€πŸ”§ Open Source & Extensible: True to the spirit of community-driven development, Medchem is open source. This ensures transparency and allows for extensibility, empowering users to tailor the library to their unique needs.

See below a glimpse of the Medchem API:

import medchem as mc

# Apply the Rule of Five to a molecule
do_pass = mc.rules.basic_rules.rule_of_five("CC(=O)OC1=CC=CC=C1C(=O)O")

# Filter a large list of molecules using the NIBR filters set
filter_ok = mc.functional.nibr_filter(
    mols=my_list_of_molecules,Β 
    n_jobs=-1,
    progress=True,
)

# Detect whether a list of molecules contains potential hinge binders features
group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
has_matches = group.has_match(my_list_of_molecules)

Take a look at the documentation for more examples and tutorials.

An ecosystem of tools for drug discovery

Medchem is part of the broader Datamol.io ecosystem of open tools that are developed at Valence Labs.

Let us know what you think of Medchem!

Medchem is a mature library that we’ve been using internally at Valence for years. We’re excited to open-source it today and hope it will benefit the scientific community!

You can check out our tutorials at medchem-docs.datamol.io. We welcome your feedback on the GitHub repository, on Valence Portal or on Twitter!

6