Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences

Fine-tuned a language model (ProGen2-base) on an assembled dataset of >1.2 million CRISPR-Cas operons, generated 4 million sequences and evaluated the diversity of samples generated. They also generated sequences to produce in vitro and selected 209 for characterization: the top versions had improved activity and/or specificity. Since CRISPR does have significant off-target effects that limit experimental conclusions, this could be very useful for generating better gene-editing tools.

1