Maze: A visual analogy for graph neural networks

Graph theory and Graph neural networks (GNNs) can be scary at first with so many architectures, and a plethora of mathematical concepts and theorems, including discrete math, combinatorics, and spectral theory.

But we face graphs every day whenever we travel in a city or try to find something at the grocery store. I hope to convince you that graphs and GNNs are intuitive with the Maze analogy.

Although mazes are mathematically very complex, we all know how to navigate them, from when we were kids tracing mazes in books, and as adults finding our way out of an Ikea. And the same holds for graphs, which brings me to proposing the top 7 strategies for navigating a maze: walking, coloring the way, squeezing through, listening, using a map, destroying walls, or using wings.

But stopping only with analogy was not enough, we wanted to build something more concrete. Therefore, based on this vision and these 7 strategies, we decided to build the Graphium library to make GNNs more powerful, expressive, and modular.

👣 1. Walking

I know how to do that! 🚶‍♀️

You’re standing in a corner and looking around. You choose a direction to walk to, but you close your eyes until you reach the next corner 🧑‍🦯. That’s how the earliest GNNs worked, including GCN, GIN, and GAT. But if you keep your eyes open, you will see the edge features of the graph, which most modern message-passing uses. These edges can be used in combination with node features (MPNN, PNA) or to gate the messages (GINE, GatedGCN).

But even with edge features, different corners can still look the same due to the graph isomorphism problem. It is easy to get lost (over-smoothing) and not reach the desired destination (over-squashing). But don’t worry, there are strategies to help you express your surroundings more easily (expressive GNNs).

🎨 2. Color the floor

Hopefully GNNs are not colorblind like myself 👓

Now, you are given some paint and a brush. What can you do? One obvious method is to use different colors to paint the floor along your walks to help identify different substructures.

Graph coloring is a well-known technique for studying various properties of graphs. It is especially useful for the Weisfeiler-Lehman graph isomorphism test, which is one of the most used measures of GNN expressivity. These colors can be used just as references, like how street names have no meaning except identifying the area you are in (RWPE). Or they can be used to change the message-passing framework to increase communication within a given structure (SMP, GSN, SIN).

But there are so many colors that it can be confusing, wasteful, and increase over-fitting! And how to decide which structures to color 🤔. Some fields, like molecular learning, have a well-defined set of interesting structures, but for more complex graphs, it might get more complicated.

🌉 3. Squeeze and bridge

🎵 London bridge is falling down 🎶

Some paths connect different neighborhoods or communities in the graph/maze, but they are narrow; they are information bottlenecks and increase over-squashing. Imagine navigating an Ikea, there are some shortcuts that allow you to move from one side to the other, but they’re hidden and narrow, so most people don’t use them.

Mazes don’t usually have secret doors or bridges, so I will switch over to the example of moving goods from city to city. We can increase the hidden dim of bottlenecks, just like highways have more lanes, or how trucks squeeze objects to ship them long distances. Or add parallel paths to the bottlenecks, for example, building a new bridge parallel to the old one.

But how do you find these paths? And build the bridges at the right place? It’s perhaps worth looking into EGP expander graph propagation, or into DIGL and SDRF which aim to counteract large curvatures. These are some of the least explored solutions in machine learning as it is hard or inefficient to implement varying hidden-dims, and might not positively correlate with test accuracy. But as discussed below, they are some of the most used techniques in optimizing transport.

When transporting objects across long distances, there are 2 solutions. The first is to build larger roads or maze corridors, which represent the increased hidden dim. The other is to squeeze the objects in a truck 🚚 or train 🚂🚃🚃, and then unpack them when they reach their destination, but it is tricky to implement. Using this strategy naively means that instead of you getting the shoes 👟 you ordered, and your neighbor getting his TV 📺, you will both get a smoothie made of blended shoes and TVs.

🔉 4. Listen to the shape

If you were a bat 🦇, this is how you would navigate the maze

Knock, yell, and listen to the echo with spectral methods by looking at the resonance of different sub-structures, or their distance based on the signal that bounces back! This is similar to how sonar works.

But as we see in the spectral methods literature (ChebNet, CayleyNet), the relations are hard to understand: different structures can sound similar or vice-versa. Location becomes ambiguous since wavelets are meant to be translation invariant (at least in an Euclidean space). And let’s not forget about the computational costs.

🗺️ 5. Use a map

Obviously… What’s next, a GPS?📍

If I give you a map of the maze, or a GPS, or better a GraphGPS, then problem solved! You can go anywhere!

Positional encodings, such as Laplacian eigenvectors or distance matrices, provide a map of the maze. Some papers propose to navigate the map using the main cardinal directions (DGN), to simply give each node’s position to the node (SAN, LSPE), or to use relative distances to bias connectivity (Graphormer). And what about having a very deep GNN to learn the map as done in GPSE? 🤯

However, having a good map is difficult, and there’s no consensus about which encoding is better for each type of graph or type of problem. Some parts of the Laplacian encoding are difficult to understand; distance matrices are too sensitive to small changes in connectivity, but random-walk probabilities are not sensitive enough to small changes.

⚒️ 6. Destroy walls

🏇 The Great Wall didn’t stop Genghis Khan, why would a small wall stop you?

Why even navigate the maze when you can just tear down any wall at any time? And rewire it any way you like? This is what methods like Drew, nDrew, or spectral modification aim to do.

But destroy too many walls, and your maze falls apart, meaning that you will lose all the inductive bias given by the graph structure. 🔥💥 This can dramatically impact some graph types more than others. For instance, if you work with molecular graphs, then it’s difficult to make sense of any kind of rewiring, unless it is related to a contact-point during a molecular folding.

🪶 7. Fly above

Is it a bird 🐦? Is it a plane 🛩️? Is it Icarus 👼? No! It’s a graph Transformer!

Why bother with the maze at all? Let’s learn from Icarus and build ourselves some wings to fly over the maze. This is what graph Transformers attempt by allowing information to move from one point to any other point in the maze/graph using a fully connected Attention module (SAN, Graphormer).

But learning to fly properly is complex (O(N²) complexity to be precise), and without a good map and good coloring, you will get lost. Before flying, learn to read the map with a few layers directly on top of the positional and structural encodings.

Never forget: what happened to Icarus can also happen to your graph Transformer, it can fail dramatically.

So remind yourself that walking (strategy #1) will be beneficial for lower memory and time requirements, as well as reducing overfitting by enforcing the bias of the graph structure, as demonstrated in GraphGPS and GPS++, the winner of the last OGB large-scale challenge.

♾️ Graphium

I hope this clarifies recent advances in GNNs and brings intuition into a field with lots of maths ➕➖✖️➗. But if it’s still complicated to wrap your head around that, don’t worry, we did all the groundwork in Graphium. An open-source library that combines the SOTA GNNs and positional encodings in an easily configurable format in the hopes of training foundational models for molecular graphs 🚀. All the best strategies in one library 📚.

Wait, you said foundational models for graphs? Does that mean that there are now ultra-large and diverse datasets available to the community? That’s right, check out our latest dataset paper covering nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. This is about 3️⃣0️⃣0️⃣0️⃣❎ more labels than OGB’s large-scale challenge.

Thinking of another strategy to navigate a maze/graph? Feel free to message me, or to implement it in Graphium 😃.

I would like to thank Frederik Wenkel, Andy Huang, and Shawn Whitfield for helping me write and revise this blog post.