Understanding over-squashing and bottlenecks on graphs via curvature

Topping, Jake; Giovanni, Francesco Di; Chamberlain, Benjamin Paul; Dong, Xiaoli; Bronstein, Michael M.

doi:10.48550/arxiv.2111.14522

Cited by 30 publications

(72 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Conveniently, the fully connected view also encompasses spectrally defined graph convolutions, such as the graph Fourier transform (Bruna et al, 2013). Nontrivial changes to N u , such as multi-hop layers (Defferrard et al, 2016), rewiring based on diffusion (Klicpera et al, 2019) or curvature (Topping et al, 2021), and subsampling (Hamilton et al, 2017) are also supported. Lastly, the methods which dynamically alter the adjacency in a learnable fashion (Kipf et al, 2018;Wang et al, 2019;Kazi et al, 2020; can also be classified under this umbrella.…”

Section: Graph Rewiringmentioning

confidence: 99%

Message passing all the way up

Veličković¹

2022

Preprint

View full text Add to dashboard Cite

The message passing framework is the foundation of the immense success enjoyed by graph neural networks (GNNs) in recent years. In spite of its elegance, there exist many problems it provably cannot solve over given input graphs. This has led to a surge of research on going "beyond message passing", building GNNs which do not suffer from those limitations-a term which has become ubiquitous in regular discourse. However, have those methods truly moved beyond message passing? In this position paper, I argue about the dangers of using this termespecially when teaching graph representation learning to newcomers. I show that any function of interest we want to compute over graphs can, in all likelihood, be expressed using pairwise message passing -just over a potentially modified graph, and argue how most practical implementations subtly do this kind of trick anyway. Hoping to initiate a productive discussion, I propose replacing "beyond message passing" with a more tame term, "augmented message passing".

show abstract

Section: Graph Rewiringmentioning

confidence: 99%

Message passing all the way up

Veličković¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Differential equations have historically played a role in designing and interpreting various algorithms in machine learning, including non-linear dimensionality reduction methods Belkin & Niyogi (2003); Coifman & Lafon (2006) Chamberlain et al (2021b) used parabolic diffusion-type PDEs to design GNNs using graph gradient and divergence operators as the spatial differential operator, a transformer type-attention as a learnable diffusivity function ('1-neighborhood coupling' in our terminology), and a variety of time stepping schemes to discretize the temporal dimension in this framework. Chamberlain et al (2021a) applied a non-euclidean diffusion equation ('Beltrami flow') to a joint positional-feature space, yielding a scheme with adaptive spatial derivatives ('graph rewiring'), and Topping et al (2021) studied a discrete geometric PDE similar to Ricci flow to improve information propagation in GNNs. We can see the contrast between the diffusionbased methods of Chamberlain et al (2021b,a) and GraphCON in the simple case of identity activation σ(x) = x and no residual connection (W = 0 and b = 0).…”

Section: Related Workmentioning

confidence: 99%

“…Several recent works proposed Graph ML models based on differential equations coming from physics Avelar et al (2019); Poli et al (2019b); Zhuang et al (2020); Xhonneux et al (2020b), including diffusion Chamberlain et al (2021b) and wave Eliasof et al (2021) equations and geometric equations such as Beltrami Chamberlain et al (2021a) and Ricci Topping et al (2021) flows. Such approaches allow not only to recover popular GNN models as discretization schemes for the underling differential equations, but also, in some cases, can address problems encountered in traditional GNNs such as oversmoothing Nt & Maehara (2019); Oono & Suzuki (2020) and bottlenecks Alon & Yahav (2021).…”

Section: Introductionmentioning

confidence: 99%

Graph-Coupled Oscillator Networks

Rusch¹,

Chamberlain²,

Rowbottom³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear forced and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g. convolutional or attentional) as the coupling function, from which a multi-layer deep neural network is built up via the dynamics of the proposed ODEs. We relate the oversmoothing problem, commonly encountered in GNNs, to the stability of steady states of the underlying ODE and show that zero-Dirichlet energy steady states are not stable for our proposed ODEs. This demonstrates that the proposed framework mitigates the oversmoothing problem. Finally, we show that our approach offers competitive performance with respect to the state-of-the-art on a variety of graph-based learning tasks.

show abstract

“…For example, the expressiveness of such GNN is bounded by the Weisfeiler-Lehman isomorphism hierarchy [23]. Also, GNNs are known to suffer from over-squashing [24], where there is a distortion of information propagation between distant nodes. Due to these limitations, the node embeddings created by GNN have limited expressiveness.…”

Section: A Challenges In Device Placementmentioning

confidence: 99%

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Wang¹,

Payberah²,

Hagos³

et al. 2022

Preprint

View full text Add to dashboard Cite

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing graph traversal order in device placement for various neural network families to improve the training time in model parallelization.

show abstract

Understanding over-squashing and bottlenecks on graphs via curvature

Cited by 30 publications

References 27 publications

Message passing all the way up

Message passing all the way up

Graph-Coupled Oscillator Networks

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Contact Info

Product

Resources

About