2021
DOI: 10.48550/arxiv.2106.05234
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Do Transformers Really Perform Bad for Graph Representation?

Abstract: The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
82
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(82 citation statements)
references
References 39 publications
0
82
0
Order By: Relevance
“…Topology information of the data affects the network architectures significantly. Typical backbones include GCN [31], hypergraph neural network (HGNN) [32] and Transformer [33]. Velickovic et al [34] presented graph attention network (GAT), which introduced self-attention in the non-Euclidean space to the original GCN.…”
Section: B Non-euclidean Methods Of Deep Learningmentioning
confidence: 99%
“…Topology information of the data affects the network architectures significantly. Typical backbones include GCN [31], hypergraph neural network (HGNN) [32] and Transformer [33]. Velickovic et al [34] presented graph attention network (GAT), which introduced self-attention in the non-Euclidean space to the original GCN.…”
Section: B Non-euclidean Methods Of Deep Learningmentioning
confidence: 99%
“…GNN models include GCN [16], GAT [26], Graph-SAGE [9] and HGNN models includes RGCN [23], HAN [29], HGT [12],NIRec [14]. Transformer-based methods include Graph-Bert [34], Graph-Transformer, [4], Graphormer [32]. We adopt two widely-used evaluation metrics: 𝐴𝑈𝐶 and 𝐿𝑜𝑔𝑙𝑜𝑠𝑠 [8], to evaluate the offline performance.…”
Section: Competitors and Metricsmentioning
confidence: 99%
“…Graph-BERT [34] introduces three types of Positional Encoding to embed the node position information to model, i.e., an absolute WL-PE which represents different codes labeled by Weisfeiler-Lehman algorithm, an intimacy based PE and a hop based PE which are variant to the sampled subgraphs. Graphormer [32] utilizes centrality encoding to ennhance the node feature and uses spatial encoding along with edge encoding to incorporate structural inductive bias to the attention mechanism. Although these models have made great progress, they assume that the graphs are homogeneous and only has one type of edges, thus their performances are limited in our setting.…”
Section: Transformers For Graph Datamentioning
confidence: 99%
“…One of the challenges in device placement is defining order for nodes in the computation graph G. Unlike text and image data, the nodes in graphs reside in a multi-dimensional space that are linked by edges to represent connectivity [22]. One has to transform graph data from the multi-dimensional space into a sequence of nodes before the majority of the DL methods can consume the graph data.…”
Section: A Challenges In Device Placementmentioning
confidence: 99%
“…In Placeto [6], the structural information can be (partially) reflected in the sequential order that the auto device placement method iterates through the nodes of the computation graph. Recent work in graph representation learning [22] has shown that successfully learning structural information of the graph helps better represent the graph. Better representations, in turn, lead to performance improvement of downstream tasks that utilize graph representations.…”
Section: A Challenges In Device Placementmentioning
confidence: 99%