PTM-Mamba: A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks

One of my gripes about models trained on crystal structures is that they only capture a particular, kind of artificial view of the protein (because they need to pack together nicely to make a crystal, you may not get things like disordered regions). Another thing is that we use organisms that are good for biotech (E. coli or yeast) to make the proteins but they don't add all the post-translational modifications (PTMs) like sugars and lipids that can affect a protein's function.

PLMs have the chance to get around some of these problems, so it's neat to see a preprint a) taking advantage of the latest tech (need to learn more about Mamba...) and b) introducing PTM tokens into PLM training. This approach would be expected to do better than ESM-2 or other PLMs on PTM-specific tasks, and apparently it does.

Maybe if we're lucky Pranam Chatterjee or one of his students will write a blog post breaking things down for us ;)