Mon, Apr 29, 3:00pm

Multimodal language models for mapping the genotype-phenotype relationship

How complex phenotypes emerge from intricate gene expression patterns is a fundamental question in biology. Quantitative characterization of this relationship, however, is challenging due to the vast combinatorial possibilities and dynamic interplay between genotype and phenotype landscapes. Integrating high-content genotyping approaches such as single-cell RNA sequencing and advanced learning methods such as language models offers an opportunity for dissecting this complex relationship. Here, I present an integrated genetics framework to analyze and interpret the high-dimensional landscape of genotypes and their associated phenotypes simultaneously. We applied this approach to develop a multimodal foundation model to explore the genotype-phenotype relationship manifold for human transcriptomics at the cellular level. The results show a refined resolution of cellular heterogeneity, enhanced precision in phenotype annotating, and uncovered potential cross-tissue biomarkers that are undetectable through conventional gene expression analysis alone. Utilizing contextualized embeddings, we investigated gene polyfunctionality which illustrates the multifaceted roles that genes play in different biological processes, and show that for VWF gene in endothelial cells. Overall, this study aims to advance our understanding of the dynamic interplay between gene expression and phenotypic manifestation and demonstrates the potential of integrated genetics in uncovering new dimensions of cellular function and complexity.

6

Previous Talks