Do Transformers Really Perform Bad for Graph Representation?

Ying, Chengxuan; Cai, Tianle; Luo, Shengjie; Zheng, Shuxin; Ke, Guolin; He, Di; Shen, Yanming; Liu, Tie-Yan

doi:10.48550/arxiv.2106.05234

Cited by 41 publications

(82 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Topology information of the data affects the network architectures significantly. Typical backbones include GCN [31], hypergraph neural network (HGNN) [32] and Transformer [33]. Velickovic et al [34] presented graph attention network (GAT), which introduced self-attention in the non-Euclidean space to the original GCN.…”

Section: B Non-euclidean Methods Of Deep Learningmentioning

confidence: 99%

Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

Zhuang¹,

Si²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect assessment by applying convolutional neural networks (CNNs) to knee MRI. However, the physiologic characteristics of the cartilage may hinder such efforts: the cartilage is a thin curved layer, implying that only a small portion of voxels in knee MRI can contribute to the cartilage defect assessment; heterogeneous scanning protocols further challenge the feasibility of the CNNs in clinical practice; the CNN-based knee cartilage evaluation results lack interpretability. To address these challenges, we model the cartilages structure and appearance from knee MRI into a graph representation, which is capable of handling highly diverse clinical data. Then, guided by the cartilage graph representation, we design a non-Euclidean deep learning network with the self-attention mechanism, to extract cartilage features in the local and global, and to derive the final assessment with a visualized result. Our comprehensive experiments show that the proposed method yields superior performance in knee cartilage defect assessment, plus its convenient 3D visualization for interpretability.

show abstract

Section: B Non-euclidean Methods Of Deep Learningmentioning

confidence: 99%

Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

Zhuang¹,

Si²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…GNN models include GCN [16], GAT [26], Graph-SAGE [9] and HGNN models includes RGCN [23], HAN [29], HGT [12],NIRec [14]. Transformer-based methods include Graph-Bert [34], Graph-Transformer, [4], Graphormer [32]. We adopt two widely-used evaluation metrics: 𝐴𝑈𝐶 and 𝐿𝑜𝑔𝑙𝑜𝑠𝑠 [8], to evaluate the offline performance.…”

Section: Competitors and Metricsmentioning

confidence: 99%

“…Graph-BERT [34] introduces three types of Positional Encoding to embed the node position information to model, i.e., an absolute WL-PE which represents different codes labeled by Weisfeiler-Lehman algorithm, an intimacy based PE and a hop based PE which are variant to the sampled subgraphs. Graphormer [32] utilizes centrality encoding to ennhance the node feature and uses spatial encoding along with edge encoding to incorporate structural inductive bias to the attention mechanism. Although these models have made great progress, they assume that the graphs are homogeneous and only has one type of edges, thus their performances are limited in our setting.…”

Section: Transformers For Graph Datamentioning

confidence: 99%

Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer

Min¹,

Yu²,

Xu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Click-Through Rate (CTR) prediction, is an essential component of online advertising. The mainstream techniques mostly focus on feature interaction or user interest modeling, which rely on users' directly interacted items. The performance of these methods is usually impeded by inactive behaviours and system's exposure, incurring that the features extracted do not contain enough information to represent all potential interests. For this sake, we propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting, then involves local neighborhood of the target user-item pair in the HIN to predict their linkage. In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes, and propose a novel Graph-masked Transformer architecture to effectively incorporates both feature and topological information. We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly. CCS CONCEPTS• Information systems → Computational advertising.

show abstract

“…One of the challenges in device placement is defining order for nodes in the computation graph G. Unlike text and image data, the nodes in graphs reside in a multi-dimensional space that are linked by edges to represent connectivity [22]. One has to transform graph data from the multi-dimensional space into a sequence of nodes before the majority of the DL methods can consume the graph data.…”

Section: A Challenges In Device Placementmentioning

confidence: 99%

“…In Placeto [6], the structural information can be (partially) reflected in the sequential order that the auto device placement method iterates through the nodes of the computation graph. Recent work in graph representation learning [22] has shown that successfully learning structural information of the graph helps better represent the graph. Better representations, in turn, lead to performance improvement of downstream tasks that utilize graph representations.…”

Section: A Challenges In Device Placementmentioning

confidence: 99%

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Wang¹,

Payberah²,

Hagos³

et al. 2022

Preprint

View full text Add to dashboard Cite

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing graph traversal order in device placement for various neural network families to improve the training time in model parallelization.

show abstract

Do Transformers Really Perform Bad for Graph Representation?

Cited by 41 publications

References 39 publications

Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Contact Info

Product

Resources

About