Paper: Single-cell multiomics data integration and generation with scPairing
https://www.biorxiv.org/content/10.1101/2025.01.04.631299v1
Abstract: Single-cell multiomics technologies generate paired or multiple measurements of different cellular properties (modalities), such as gene expression and chromatin accessibility. However, multiomics technologies are more expensive than their unimodal counterparts, resulting in smaller and fewer available multiomics datasets. Here, we present scPairing, a variational autoencoder model inspired by Contrastive Language-Image Pre-Training, which embeds different modalities from the same single cells onto a common embedding space. We leverage the common embedding space to generate novel multiomics data following bridge integration, a method that uses an existing multiomics bridge to link unimodal data. Through extensive benchmarking, we show that scPairing constructs an embedding space that fully captures both coarse and fine biological structures. We then use scPairing to generate new multiomics data from retina and immune cells. Furthermore, we extend to co-embed three modalities and generate a new trimodal dataset of bone marrow immune cells. Researchers can use these generated multiomics datasets to discover new biological relationships across modalities or confirm existing hypotheses without the need for costly multiomics technologies.