Augmented Memory: Capitalizing on Experience Replay to Accelerate Generative Molecular Design

Jeff Guo
 · PhD Student @ EPFL

Generative drug design has moved from a proof-of-concept paradigm to real-world impact as observed in the recent surge in works reporting experimental validation (verified binding in cell assays) for both distribution learning [1-14] and goal-directed [15-21] approaches. In goal-directed generation, the objective is to explicitly optimize for a target property profile as evaluated by an Oracle, defined here as a computational method that returns a property value prediction. An example (and a ubiquitous tool) is molecular docking [22] which returns a docking score as a proxy for binding affinity. While docking is a valuable tool, it has limitations from making simplifying assumptions such as treating the protein as a static and rigid structure.

Consequently, a proper treatment of ligand-protein binding requires considering protein dynamics through molecular dynamics (MD) simulations such as free energy perturbation (FEP) [23]. Since these MD simulations can (not always) predict binding affinity within 1 kcal/mol, why not directly optimize these simulations during goal-directed generation? Unfortunately, each MD oracle call requires GPU hours so it is currently infeasible. The broader purpose of presenting this specific example is to convey that there is generally a trade-off between predictive accuracy and computational cost. This leads to the research problem and contribution of Augmented Memory [24]: improving Sample Efficiency, defined as how few oracle evaluations are required to design molecules with the target property profile. Improved sample efficiency will enable more efficient goal-directed generation and unlock the ability to explicitly optimize expensive high-fidelity oracles.

Part 1: Base Generative Model

Augmented Memory builds on REINVENT [25, 26] and uses a recurrent-neural network (RNN) with long short-term memory (LSTM) cells [27]. Molecules are represented as SMILES [28] and the base generative model is trained to maximize the likelihood of a dataset of SMILES sequences. From this base model (Prior), the Augmented Memory reinforcement learning (RL) algorithm is applied to find a Policy that maximizes the expected reward.

Part 2: Augmented Memory Reinforcement Learning Algorithm

Augmented Memory shares a part of REINVENT's optimization algorithm. During the course of RL, the highest rewarding molecules generated so far are stored in the Replay Buffer (with a maximum capacity of 100). In REINVENT, 10 SMILES are randomly sampled from the replay buffer and “replayed” to the model in the form of Experience Replay. The results in the Augmented Memory paper show that experience replay drastically improves the performance of REINVENT and previously proposed algorithmic modifications, including Best Agent Reminder (BAR) [29] and Augmented Hill Climbing (AHC) [30]. Augmented Memory capitalizes on this observation with the key insight that performing experience replay with the entire replay buffer significantly improves sample efficiency at the expense of increased susceptibility to Mode Collapse (the model becomes stuck and generates the same molecules repeatedly).  To rescue mode collapse, SMILES augmentation [31] is applied. SMILES are non-injective and alternative atom-numbering on the molecular graph can yield alternative SMILES sequences that map to the same molecule. Appendix D in the paper shows that SMILES augmentation is effectively a regularizer and enables Augmented Memory to retain sample efficiency gains while preventing mode collapse. (An interesting observation from Appendix C is that backpropagating using only the replay buffer can also work.)

Augmented Memory follows  4 general steps (the difference to REINVENT is bolded):

  1. Sample a batch of trajectories (SMILES)

  2. Computing the reward of each molecule by the oracle function

  3. Augment both the sampled SMILES and all SMILES in the replay buffer

  4. Backpropagate using all augmented SMILES 

Repeat steps 3 and 4 N (2 is found to be stable) times before returning to step 1.

Augmented Memory’s sample efficiency improvements come from two sources: experience replay using the entire replay buffer and backpropagating multiple times per epoch (learning multiple times from the same examples).

Part 3: Results

In the paper, 3 experiments were performed:

  1. Aripiprazole Similarity

  2. Practical Molecular Optimization (PMO) Benchmark [32] 

  3. DRD2 Inhibitor Design

In this blog, only the PMO benchmark results are shown. The PMO benchmark was recently proposed to assess the sample efficiency of molecular optimization methods. The authors evaluated 25 models across 23 optimization tasks and found that REINVENT was the (previously) best-performing model. The results in the Augmented Memory paper convey two results:

  1. Augmented Memory achieves the new state-of-the-art on the PMO benchmark and outperforms REINVENT by 2-3 model ranks (based on the AUC Top-10 difference between adjacently ranked models).

  2. Adding experience replay to all algorithms improves their performance (see BAR and AHC)

Part 4: Practical Application

Augmented Memory runs by passing configuration files which specify all parameters of the generative experiment. In case of interest in reproducing the results in the Augmented Memory paper, prepared configuration files and instructions are provided here.

If using Augmented Memory for other design tasks, existing tutorials for executing REINVENT will be particularly relevant as the set-up is nearly identical (except for a few new parameters which will be presented in this section). Jupyter tutorial notebooks are provided here with instructions.

The only difference in executing Augmented Memory compared to REINVENT is the following additional parameters in the configuration files:

  • “optimization_algorithm”: specifies which optimization algorithm to use and the options are “augmented_memory”, “bar” for Best Agent Reminder [29], and “ahc” for Augmented Hill Climbing [30].

  • “augmented_memory”: denotes whether to perform multiple backpropagations with augmented SMILES which is the mechanism of Augmented Memory. If set to false, then the optimization algorithm is identical to REINVENT.

  • “augmentation_rounds”: specifies how many backpropagations to perform with augmented SMILES. 2 is found to be the most stable.

  • “selective_memory_purge”: mechanism to promote diversity in the generated set. In general, it is recommended to set this true. 

With these parameters, the full configuration file for Augmented Memory is specified! 

Now we move to the bread-and-butter of Goal-directed Generation. Augmented Memory is a general optimization algorithm and the usefulness depends on the multi-parameter optimization objective. In this regard, Augmented Memory uses a flexible Scoring Function which defines all the oracles to be simultaneously optimized. See the following file for a list of oracles that can be optimized, which range from physico-chemical properties like molecular weight, to structure-based oracles such as molecular docking, and quantum-mechanical properties such as ionization potential energy.

The Scoring Function is modular and a variable number of oracles can be specified by appending to a list in the configuration file. Below is an example of an oracle which in this case is DockStream (wrapper package for molecular docking algorithms) [33]. A specific tutorial for executing DockStream can be found here (also covers all the parameters).

Part 5: Outlook

In this final section, the overarching theme is providing a perspective on why one may use a  SMILES-based model. We discuss three topics that contextualize Augmented Memory in the field of generative molecular design and provide an outlook on the sample efficiency problem.

5.1 SMILES-based models capture the training data distribution

Models operating on SMILES were amongst the first to be proposed for generative molecular design and have been extensively benchmarked in GuacaMol [34], MOSES [35], and PMO [32]. For a generative model to be useful, it must satisfy the baseline metrics of high Validity (generates valid molecules), Uniqueness (does not only the same molecules repeatedly), and Novelty (generates molecules not in the training data). We note that achieving exact numbers in these metrics is not the be all, end all, and differences in a few percent often do not have any effect on practical applications. For example, 97% validity vs. 99% validity, especially when sampling molecules is computationally inexpensive; invalid molecules can just be discarded. Nonetheless, these metrics are useful sanity checks to verify the generative model is performing as desired. In the context of language-based molecular generative models, satisfying these metrics is a solved problem, as shown in the GuacaMol and MOSES benchmarks. Subsequently, assessing the base model can mostly be delegated to assessing whether it captures the training data distribution. Language-based models can accomplish this and even learn irregular distributions [36].

The take-home message of this subsection is that SMILES-based models work: the base generative model can be easily trained and it properly captures the training data distribution. 

5.2 Language-based models are not necessarily 3D-naĂŻve

One problem of SMILES-based models can be the insufficient treatment of 3D as differences in stereochemistry (spatial arrangements of atoms) can have an enormous impact on biological activity. More recent approaches such as graph-based models bring innovations that enable 3D-aware generation, for example by conditioning generation on the context of a protein binding site [37, 38]. In this regard, these models explicitly handle 3D. However, this does not mean language-based molecular generative models are completely 3D-naïve. While generation may be in 1D (such as a SMILES sequence), the model can receive feedback from 3D-aware oracles. For example, molecular docking can take as input a SMILES sequence and enumerate possible stereoisomers and then proceed to dock the entire set. Oracle feedback to the model can then make the generative process implicitly 3D-aware. For example, in the DockStream [33] work, the REINVENT framework which generates SMILES, was tasked to optimize 15 different docking configurations (spanning open-source and proprietary docking tools). In all cases, the model learns to generate SMILES that, when enumerated into stereoisomers, achieve a high docking score and has a predicted docking pose complementary to the protein binding site. However, this is not without disadvantages, as given a single SMILES sequence, the entire enumerated set needs to be docked, which is inefficient. Furthermore, some questions remain to be fully answered: can conditioned generation work better than traditional computational chemistry oracles (particularly in cases where there is relatively little training data)? In which cases? In some cases? 

The take-home message of this subsection is to convey that language-based molecular generative models that generate in 1D, are not necessarily completely 3D-naïve. However, explicitly 3D-aware generative models confer advantages such as direct 3D generation or 3D-conditioned generation which are reasons why there is so much exciting research going on. 

5.3: Towards Sufficient Sample Efficiency

Finally, having discussed that language models can:

  1. Learn the training data distribution 

  2. Be made implicitly 3D-aware

We circle back to the introduction of this blog when sample efficiency was first introduced. Augmented Memory satisfies 1 and 2 and makes a contribution on improving sample efficiency as a step towards enabling direct optimization of expensive high-fidelity oracles. In this subsection, we provide a perspective on what sufficient sample efficiency means. In one sentence: sufficient sample efficiency means goal-directed generation to optimize state-of-the-art (in predictive accuracy) oracles is possible. For computational oracles, computational resources and model efficiency will “meet halfway”: access to more computational resources will naturally diminish the gap as sample efficiency improves. An example of such a computational oracle is MD simulations. However, in the long term, it is natural to strive towards being able to directly optimize (wet-lab) experiments (for example, together with a robotic platform) where the sample efficiency bottleneck is even more unforgiving. An acceptable oracle budget in this case may be in the hundreds, and there are no present models with sufficient sample efficiency for this. Nonetheless, it is useful to keep the end-goal in perspective.

We end this blog with an outlook on future work to improve sample efficiency. Recently, we proposed Beam Enumeration [39] as a general methodology to exhaustively enumerate partial SMILES and extract molecular substructures from them. Using these extracted substructures, self-conditioned generation can be achieved by filtering successively sampled batches of molecules to contain these specific substructures and discarding the rest. The effect is drastically improved sample efficiency and we show it can be combined with Augmented Memory. Our future work will be on pushing the limits of this framework and contribute to achieving sufficient sample efficiency.

Thank you for reading and please reach out if there are any issues or questions!

References

[1] Merk et al. Commun. Chem. 2018, 1, 68.      

[2] Yu et al. ACS Omega 2021, 6, 22945.

[3] Grisoni et al. Sci. Adv 2021, 7.     

[4] Moret et al. Angew. Chem. 2021, 60, 19477.

[5] Jang et al. Front. Mol. Biosci. 2022, 9.      

[6] Eguida et al. J. Med. Chem. 2022, 65, 13771.

[7] Li et al. Nat. Commun. 2022, 13, 6891.

[8] Chen, Yang et al. RSC Adv. 2022, 12, 22893.

[9] Tan et al. J. Med. Chem. 2022, 65, 103.      

[10] Hua et al. J. Chem. Inf. Model. 2022, 62, 1654.

[11] Moret et al. Nat. Commun. 2023, 14, 114.    

[12] Song et al. Eur. J. Med. Chem 2023, 247, 115034.

[13] Ballarotto et al. J. Med. Chem. 2023, 66, 8170.

[14] Atz et al. Deep interactome learning for de novo drug design. ChemRxiv 2023.

[15] Korshunova et al. Commun. Chem. 2022, 5, 129.

[16] Yoshimori et al. ChemMedChem 2021, 16, 955.

[17] Zhavoronkov et al. Nat. Biotechnol. 2019 37, 1038.

[18] Ren et al. Chem. Sci. 2023, 14, 1443. 

[19] Li et al. J. Med. Chem. 2023, 66, 5439. 

[20] Zhu et al. Bioorg. Med. Chem. 2023, 91, 117414.

[21] Salas-Estrada et al. J. Chem. Inf. Model. 2023, 63, 5056.

[22] Trott & Olson J Comput. Chem. 2010, 31, 455.

[23] Wang et al. J. Am. Chem. Soc. 2015, 137, 2695.

[24] Guo & Schwaller Augmented Memory: Capitalizing on Experience Replay to Accelerate De Novo Molecular Design. ChemRxiv 2023.

[25] Olivecrona et al. J. Cheminformatics 2017, 9.

[26] Blaschke et al. J. Chem. Inf. Model. 2020, 60, 5918.

[27] Hochreiter & Schmidhuber Neural Comput. 1997, 9, 1735.

[28] Weininger J. Chem. Inf. Comput. Sci. 1988, 28.

[29] Atance et al. J. Chem. Inf. Model. 2022, 62, 4863.

[30] Thomas et al. J. Cheminformatics 2022, 14.

[31] Bjerrum, SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. ArXiv 2017.

[32] Gao, Fu et al. Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization NeurIPS Track Datasets and Benchmarks, 2022.

[33] Guo et al. J. Cheminformatics 2021, 13.

[34] Brown et al. J. Chem. Inf. Model. 2019, 59, 1096.

[35] Polykovskiy et al. Front. Pharmacol. 2020, 11.

[36] Flam-Shepherd et al. Nat. Commun. 2022, 13, 3293.

[37] Schneuing, Du et al. Structure-based Drug Design with Equivariant Diffusion Models. ArXiv 2023.

[38] Igashov et al. EQUIVARIANT 3D-CONDITIONAL DIFFUSION MODELS FOR MOLECULAR LINKER DESIGN. ArXiv 2023.

[39] Guo & Schwaller Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design. ArXiv 2023.

3