Valence and Collaborators Place First in the Open Graph Benchmark Large-Scale Challenge!

🥇Get early access to MolGPS, the largest pre-trained model for molecular featurization, built on top of this year's winning GPS++ architecture

We are thrilled to announce that Valence, in collaboration with researchers from Graphcore and Mila, has placed 1st in the molecular property prediction (MPP) track of the Open Graph Benchmark Large-Scale Challenge at NeurIPS2022, ahead of teams from Microsoft and NVIDIA.

Our submission was a scaled-up version of our prior work with Mila, GraphGPS, now representing category-leading molecular property prediction performance.

About the Open Graph Benchmark

The OGB challenge is the machine learning industry’s leading test of graph network capability, presenting teams with large datasets and an associated predictive task. It enables researchers to benchmark their graph models with the ultimate goal of accelerating innovation in graph research and machine learning.

Valence competed on the PCQM4Mv2 track, which defines a molecular property prediction problem that involves building Graph Neural Networks to predict the HOMO-LUMO energy gap, a quantum chemistry property, given a dataset of 3.4 million labeled molecules. Last year's competition garnered over 500 submissions from top researchers around the world, with Microsoft, Huawei, and DeepMind emerging as the 2021 champions.

We are excited for Valence, Graphcore, and Mila to emerge at the top of the OGB challenge in 2022. You can read the full technical report describing our winning architecture here.

Graph Networks for Molecular Featurization

One of the challenges when applying machine learning to molecules is finding the best way to represent the complexity of their dynamic structures in a machine-readable form. Graph representation has emerged as a promising and popular technique for featurizing molecules with demonstrated advantages over existing 1D strings (SMILES, SMARTS, SELFIES) and fingerprint-based approaches (ECFP, MACCS). This has led to many advances in scaling graph-based architectures for improved featurization and representation.

Moreover, the success of Transformers seen in Natural Language Processing (NLP) and computer vision has led many researchers to explore the use of Graph Transformers (GTs) for related tasks, including a handful of recent projects at Valence: SAN, GraphGPS, and LRGB.

However, there are still many challenges to overcome when applying GNNs and GTs to molecular featurization. As a technical example, standard GNNs fail to distinguish molecular graphs that pass the Weisfeiler-Lehman WL-1 isomorphism test, meaning that they can confuse molecules with similar sub-structures, a common occurrence in drug discovery. Conversely, GTs can easily overfit the positional encodings and rarely work on the smaller datasets typically encountered in a real-world drug discovery setting.

Over the last 18 months, a great number of positional and structural features for GTs were created to make nodes more distinguishable, addressing the above-mentioned issues and enabling more faithful molecular representations. However, until GraphGPS, there has yet to be a principled method for organizing and working with the various graph transformer layers and positional features in a molecular graph.

Our Origins: GraphGPS

In previous work with Mila, we presented GraphGPS, a recipe for a General, Powerful and Scalable Graph Transformer. By using a diversity of positional encodings, the network became powerful enough to distinguish any molecular graph and better recognize the substructures that are responsible for driving molecular activity. Moreover, by introducing a hybrid message-passing / Transformer architecture, there was no longer an overfitting problem.

We saw that many GTs lacked a common foundation about what constitutes a good positional or structural encoding (i.e. a way to determine the position and surroundings of individual atoms). In GraphGPS, we studied what differentiates them and how to maximize the performance from both message-passing and Transformers. We thus unified them in our framework by bringing everything together:

Unified positional / structural encodings, grouped into local, global, or relative subtypes
Local message-passing mechanism that handles edge features
Global attention mechanism that allows full connectivity

After achieving competitive results on 16 benchmarks with GraphGPS, we scaled up our work for the 2022 Open Graph Benchmark challenge.

Partnering with Graphcore

After seeing firsthand the performance of GraphGPS on molecular featurization and property prediction tasks, we, along with our collaborators at Mila, became interested in exploring scaling laws around this framework to see if further gains could be made through scale. When looking at different hardware options, we became interested in exploring IPUs given compelling attributes for GNNs relative to GPUs.

Whereas GPUs are designed to process large, dense, and homogenous data, graph-based molecular data tends to be the opposite:

Edges (bonds) have sparse encodings to represent the interactions between atoms (e.g. single vs double bonds)
Nodes (atoms) have sparse encodings depending on their nature (e.g. carbon, hydrogen, etc)
Connectivity between atoms is sparse since there are usually 1-4 bonds per atom

Molecular data is thus sparse in nature and, from a hardware perspective, requires irregular data movements.

This is where Valence Research saw the advantage of IPUs, which are well-suited to molecular graph data due to the 1,472 independent processing cores and nearly 1 GB of on-chip SRAM next to those cores, allowing for extremely high-bandwidth data updates and exchange.

What we saw in practice was large computational advantages that allowed us to run models 4-5x faster than Microsoft and Nvidia’s Transformer-M models, improving our hyper-parameter sweep and enabling us to ensemble 112 models.

You can try running one of the example Graph Neural Network models on Graphcore IPUs from our model garden using Paperspace:

What’s Next?

At Valence, we’ve been active players in graph representation learning for several years, with our work now translating into category-leading performance across an unprecedented number of predictive tasks.

To help ensure others in the industry are able to easily implement some of these same advances, we’ll be open-sourcing our internal library for graph representation learning on molecules over the coming weeks, providing users with pre-trained models that enable rich featurization for state-of-the-art molecular property prediction and generative design.

To stay up-to-date on the release of this library, and future projects from Valence Research, please sign up here.