Raman2RNA

https://www.nature.com/articles/s41587-023-02082-2

Exciting idea here, predicting high throughput and interpretable features (scRNAseq) from a non-invasive (and hopefully in the future scalable) method.

The non-invasive method is Raman microscopy which gives information about molecular composition and molecular bonds, but the information is not as directly interpretable as 'omics measurements, i.e. you don't get an abundance for specific proteins or transcripts.

The authors propose two models to predict scRNAseq data from the RAMAN spectra:
1. use a spatially resolved targeted method to measure several marker transcripts (smFISH), and then use these markers to match up 'pseudo pairs' between RAMAN and scRNAseq to train the model on
2. use an adversarial auto encoder to push both modalities into an indistinguishable latent space without any known pairings (qualifier, they still used cell-type labels).

Select results:

The paired method 1 worked decidedly better than unpaired 2, yet both had impressive results by various metrcis (e.g. using that smFISH data for leave one out evaluation).

For 1, the model could be trained on a subset of data, e.g. the destructive smFISH data was only needed for a few time points out of a longer time series; however between study generalization doesn't work well, likely due to not having enough diverse data to train this yet.
Predictions of scRNAseq based on RAMAN massively outperformed that based on brightfield microscopy. This is a result I personally find very exciting and it makes a lot of theoretical sense that the RAMAN spectra would provide the sort of information to have a closer link to scRNAseq than brightfiled does. However, as a matter of baseline skepticism, I do note that there would be standard ways to push the brightfield models further (e.g. by pre-training on a larger data corpus). Definitely looking forward to how this holds up to further validation and the test of time.