Mon, Feb 12, 4:00pm

Protein Discovery with Discrete Walk-Jump Sampling

Abstract: We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the maximum likelihood training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 35% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.

Speaker bio: I am a Machine Learning Scientist and Group Leader at Prescient Design • Genentech. At Prescient, my focus is on developing and applying machine learning methods to protein design. I have a PhD in Materials Science & Engineering from the University of Pennsylvania. I use multiscale modeling and machine learning to design materials and molecules. My research spans condensed matter physics, chemistry, biology, computation, and automation. I have published more than 20 scientific papers in journals such as Science Advances, ACS Nano, JACS, and Chemistry of Materials. I co-founded Atomic Data Sciences to provide data management and analysis software for nanotech companies. Previously, I was a Postdoctoral Associate at MIT working with the Lincoln Lab Supercomputing and AI groups. I was a National Defense Science & Engineering Graduate Fellow in the Shenoy group at UPenn and an affiliate scientist with Kristin Persson’s Materials Project group at Berkeley Lab.

1

Previous Talks