Learning Representations on Biological Data with Weakly Supervised Learning

Abstract: High-throughput perturbational experiments measure changes in a biological feature space due to treatment with small molecules or genetic reagents. Further analysis then uses similarity metrics to identify biological relationships: to characterize mechanisms of action, to identify putative therapeutics, and to construct biological networks. In this talk, I present two recent works on learning representations of perturbational data. First, I consider how transformations on a dataset affect the distribution of cosine similarity, a representative, commonly-used tool for computing associations on perturbational data. This result motivates Perturbational Metric Learning (PeML), a weakly-supervised learning method that uses replicate data to learn a representation that is more useful for downstream tasks. The learned similarity functions show substantial improvement for recovering known biological relationships, like mechanism of action identification. In addition to capturing a more meaningful notion of similarity, data in the transformed basis can be used for other analysis tasks, such as classification and clustering. Representation learning with weak supervision or self-supervision is an exciting frontier for computational biology.

https://arxiv.org/abs/2310.13994

https://www.biorxiv.org/content/10.1101/2023.06.09.544397v1

Learning Representations on Biological Data with Weakly Supervised Learning

Previous Talks