DeepGCNs: Making GCNs Go as Deep as CNNs

Li, Guohao; Mueller, Matthias; Qian, Guocheng; Perez, Itzel Carolina Delgadillo; Abualshour, Abdulellah; Thabet, Ali; Ghanem, Bernard

doi:10.1109/tpami.2021.3074057

Cited by 96 publications

(82 citation statements)

References 75 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In computer vision, GCNs have been successfully applied to scene graph generation [22,31,38,52,56], 3D understanding [16,29,49,51], and action recognition in video [20,53,55]. In MAAS we desing a DeepGCN-like architecture [27,28,30], that addresses a special scenario, namely the multi-modal nature of audiovisual data. We rely on the well known EdgeConv operator [49], to model interactions between different modalities on graph nodes identified across multiple frames.…”

Section: Related Workmentioning

confidence: 99%

MAAS: Multi-modal Assignation for Active Speaker Detection

León-Alcázar¹,

Heilbron²,

Thabet³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling their temporal progression. Despite its inherent mutimodal nature, current methods still focus on modeling and fusing short-term audiovisual features for individual speakers, often at frame level. In this paper we present a novel approach to active speaker detection that directly addresses the multi-modal nature of the problem, and provides a straightforward strategy where independent visual features from potential speakers in the scene are assigned to a previously detected speech event. Our experiments show that, an small graph data structure built from a single frame, allows to approximate an instantaneous audio-visual assignment problem. Moreover, the temporal extension of this initial graph achieves a new state-of-the-art on the AVA-ActiveSpeaker dataset with a mAP of 88.8%.

show abstract

Section: Related Workmentioning

confidence: 99%

MAAS: Multi-modal Assignation for Active Speaker Detection

León-Alcázar¹,

Heilbron²,

Thabet³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…resentation fits the most with deep learning is not straightforward and remains an open problem [22,25,14]. Recent advances in graph convolution networks [14] suggest that graph representations could provide better features for point cloud processing. Such a representation already outperforms the state-of-the-art in many other computer vision tasks [32,19,28,34].…”

Section: R-gcn C-gcnmentioning

confidence: 99%

PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement

Zarzar,

Giancola,

Ghanem

2019

Preprint

Self Cite

View full text Add to dashboard Cite

In autonomous driving pipelines, perception modules provide a visual understanding of the surrounding road scene. Among the perception tasks, vehicle detection is of paramount importance for a safe driving as it identifies the position of other agents sharing the road. In our work, we propose PointRGCN: a graph-based 3D object detection pipeline based on graph convolutional networks (GCNs) which operates exclusively on 3D LiDAR point clouds. To perform more accurate 3D object detection, we leverage a graph representation that performs proposal feature and context aggregation. We integrate residual GCNs in a twostage 3D object detection pipeline, where 3D object proposals are refined using a novel graph representation. In particular, R-GCN is a residual GCN that classifies and regresses 3D proposals, and C-GCN is a contextual GCN that further refines proposals by sharing contextual information between multiple proposals. We integrate our refinement modules into a novel 3D detection pipeline, PointRGCN, and achieve state-of-the-art performance on the easy difficulty for the bird eye view detection task.

show abstract

“…To better represent locality, we leverage the power of graphs and specifically Graph Convolutional Networks (GCNs). GCNs are considered a versatile tool to process non-Euclidean data, and recent research on point cloud semantic and part segmentation shows their power in encoding local and global information [25,13,12]. In this paper, we use GCNs to design novel point cloud upsampling modules (refer to Figure 1), which are better equipped at encoding local information and learn to generate new point patches instead of merely replicating parts of the input.…”

Section: Mgcn Clone Nsmentioning

confidence: 99%

“…To learn better hierarchical feature representation, graph pooling methods such as DIFFPool [29] and SAGPooling [11] are proposed. Recently, Li et al [13,12] introduced residual/skip connections and dilated convolutions to GCNs, and successfully trained high capacity GCN architectures over 100 layers in depth. Previous GCN works mainly investigate discriminative models for node classification or graph classification tasks.…”

Section: Related Workmentioning

confidence: 99%

PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks

Qian¹,

Abualshour²,

Li³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Upsampling sparse, noisy, and non-uniform point clouds is a challenging task. In this paper, we propose 3 novel point upsampling modules: Multi-branch GCN, Clone GCN, and NodeShuffle. Our modules use Graph Convolutional Networks (GCNs) to better encode local point information. Our upsampling modules are versatile and can be incorporated into any point cloud upsampling pipeline. We show how our 3 modules consistently improve state-of-the-art methods in all point upsampling metrics. We also propose a new multi-scale point feature extractor, called Inception DenseGCN. We modify current Inception GCN algorithms by introducing DenseGCN blocks. By aggregating data at multiple scales, our new feature extractor is more resilient to density changes along point cloud surfaces. We combine Inception DenseGCN with one of our upsampling modules (NodeShuffle) into a new point upsampling pipeline: PU-GCN. We show both qualitatively and quantitatively the advantages of PU-GCN against the state-of-the-art in terms of fine-grained upsampling quality and point cloud uniformity. The source code of this work is available at https://github.com/guochengqian/PU-GCN.

show abstract

DeepGCNs: Making GCNs Go as Deep as CNNs

Cited by 96 publications

References 75 publications

MAAS: Multi-modal Assignation for Active Speaker Detection

MAAS: Multi-modal Assignation for Active Speaker Detection

PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement

PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks

Contact Info

Product

Resources

About