Neural Graph Matching Networks for Fewshot 3D Action Recognition

Guo, Michelle; Chou, Edward; Huang, De-An; Song, Shuran; Yeung, Serena; Li, Feifei

doi:10.1007/978-3-030-01246-5_40

Cited by 93 publications

(61 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, the number of applicable directions of GNNs in computer vision is still growing. It includes human-object interaction [144], few-shot image classification [145], [146], [147], semantic segmentation [148], [149], visual reasoning [150], and question answering [151].…”

Section: Practical Applicationsmentioning

confidence: 99%

A Comprehensive Survey on Graph Neural Networks

Pan

Chen

et al. 2021

IEEE Trans. Neural Netw. Learning Syst.

6,220

2,977

View full text Add to dashboard Cite

Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.

show abstract

Section: Practical Applicationsmentioning

confidence: 99%

A Comprehensive Survey on Graph Neural Networks

Pan

Chen

et al. 2021

IEEE Trans. Neural Netw. Learning Syst.

6,220

2,977

View full text Add to dashboard Cite

show abstract

“…NCMN: Neural Graph Matching Network. In [43], a Neural Graph Matching Network (NGMN) is proposed for few-shot 3D action recognition, where 3D data are represented as interaction graphs. A GCN is applied for updating node features in the graphs and an MLP is employed for updating the edge strength.…”

Section: Gnn-based Graph Matching Networkmentioning

confidence: 99%

“…Since graph data usually has complex structure, how to learn a metric so that it can facilitate generalizing from a few graph examples is a big challenge. Some recent work [43] has begun to explore the few-shot 3D action recognition problem with graph-based similarity learning strategies, where a neural graph matching network is proposed to jointly learn a graph generator and a graph matching metric function to optimize the few-shot learning objective of 3D action recognition. However, since the objective is defined specifically based on the 3D action recognition task, the model can not be directly used for other domains.…”

Section: Few-shot Learningmentioning

confidence: 99%

Deep Graph Similarity Learning for Brain Data Analysis

Ahmed

Willke

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

In many domains where data are represented as graphs, learning a similarity metric among graphs is considered a key problem, which can further facilitate various learning tasks, such as classification, clustering, and similarity search. Recently, there has been an increasing interest in deep graph similarity learning, where the key idea is to learn a deep learning model that maps input graphs to a target space such that the distance in the target space approximates the structural distance in the input space. Here, we provide a comprehensive review of the existing literature of deep graph similarity learning. We propose a systematic taxonomy for the methods and applications. Finally, we discuss the challenges and future directions for this problem.

show abstract

“…For example, Liu et al [25] adopted unary fluents to represent attributes of a single object, and binary fluents for two objects in egocentric videos, and then they used LSTM [11] to recognize which action is performed. In addition, Recurrent Neural Networks (RNN) [16] or Graph Convolutional Networks (GCN) [12,18,31,50] is used for structured video representation and action recognition in 2D or 3D scenes. Due to the absence of rules for logical reasoning, the explainability of these methods is limited.…”

Section: Related Workmentioning

confidence: 99%

“…The popular two-stream convolutional networks [3,9,41,44] can capture the complementary information on appearance from still frames and motion between frames. Besides, spatio-temporal graphs with Recurrent Neural Networks (RNN) [16] or Graph Convolutional Networks (GCN) [12,18,31,50] focus on the structured video representation. Recently, with the advances of deep learning in scene graph representation [4,22,51], researchers attempt to use attributes of an object and the relationship between objects for semantic-level video content understanding.…”

Section: Introductionmentioning

confidence: 99%

Explainable Video Action Reasoning via Prior Knowledge and State Transitions

Zhuo

Cheng

Zhang

et al. 2019

Proceedings of the 27th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Human action analysis and understanding in videos is an important and challenging task. Although substantial progress has been made in past years, the explainability of existing methods is still limited. In this work, we propose a novel action reasoning framework that uses prior knowledge to explain semantic-level observations of video state changes. Our method takes advantage of both classical reasoning and modern deep learning approaches. Specifically, prior knowledge is defined as the information of a target video domain, including a set of objects, attributes and relationships in the target video domain, as well as relevant actions defined by the temporal attribute and relationship changes (i.e. state transitions). Given a video sequence, we first generate a scene graph on each frame to represent concerned objects, attributes and relationships. Then those scene graphs are linked by tracking objects across frames to form a spatio-temporal graph (also called video graph), which represents semantic-level video states. Finally, by sequentially examining each state transition in the video graph, our method can detect and explain how those actions are executed with prior knowledge, just like the logical manner of thinking by humans. Compared to previous works, the action reasoning results of our method can be explained by both logical rules and semantic-level observations of video content changes. Besides, the proposed method can be used to detect multiple concurrent actions with detailed information, such as who (particular objects), when (time), where (object locations) and how (what kind of changes). Experiments on a re-annotated dataset CAD-120 show the effectiveness of our method.

show abstract

Neural Graph Matching Networks for Fewshot 3D Action Recognition

Cited by 93 publications

References 33 publications

A Comprehensive Survey on Graph Neural Networks

A Comprehensive Survey on Graph Neural Networks

Deep Graph Similarity Learning for Brain Data Analysis

Explainable Video Action Reasoning via Prior Knowledge and State Transitions

Contact Info

Product

Resources

About