2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00719
|View full text |Cite
|
Sign up to set email alerts
|

Graph Convolutional Networks for Temporal Action Localization

Abstract: Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is repre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
233
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 486 publications
(251 citation statements)
references
References 35 publications
2
233
0
Order By: Relevance
“…First, DEG could be easily combined with an algorithm to track an animal's location in an environment 2 , thus allowing the identification of behaviors of interest and where those behaviors occur. Also, while the use of CNNs for classification is standard practice in machine learning, recent works in temporal action detection use widely different sequence modeling approaches and loss functions 29,32,39 . Testing these different approaches in the DEG pipeline could further improve performance.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…First, DEG could be easily combined with an algorithm to track an animal's location in an environment 2 , thus allowing the identification of behaviors of interest and where those behaviors occur. Also, while the use of CNNs for classification is standard practice in machine learning, recent works in temporal action detection use widely different sequence modeling approaches and loss functions 29,32,39 . Testing these different approaches in the DEG pipeline could further improve performance.…”
Section: Discussionmentioning
confidence: 99%
“…We modeled our approach after temporal action localization methods used in computer vision aimed to solve related problems [32][33][34][35] . The overall architecture of our solution included: 1. estimating motion (optic flow) from a small snippet of video frames, 2. compressing a snippet of optic flow and individual still images into a lower dimensional set of features, 3. using a sequence of the compressed features to estimate the probability of each behavior at each frame in a video (Fig.…”
Section: Modeling Approachmentioning
confidence: 99%
“…Temporal action localization aims to detect the temporal boundaries and the categories of action instances in untrimmed videos. The supervised methods [3,27,29,37,44] mainly adopt the two-stage framework, which first produces a series of temporal action proposals, then predicts the action class and regresses their boundaries. Concretely, Shou et al [29] design three segment-based 3D ConvNet to accurately localize action instances and Zhao et al [44] apply a structured temporal pyramid to explore the context structure of actions.…”
Section: Related Work 21 Temporal Action Localizationmentioning
confidence: 99%
“…Concretely, Shou et al [29] design three segment-based 3D ConvNet to accurately localize action instances and Zhao et al [44] apply a structured temporal pyramid to explore the context structure of actions. Recently, Chao et al [3] transfer the classical Faster-RCNN framework [26] for action localization and Zeng et al [37] exploit proposal-proposal relations using graph convolutional networks. Under the weakly-supervised setting only with video-level action labels, Wang et al [32] design the classification and selection module to reason about the temporal duration of action instances.…”
Section: Related Work 21 Temporal Action Localizationmentioning
confidence: 99%
“…GNN integrates the advantages of classical graph models and popular neural networks with a strong relation representation and feature learning ability. GNN has been used in many tasks involving relation inference, such as human-object interaction (HOI) [7,29], scene understanding [19,24], human action localization [48] and human gaze communication [22]. GNN was also used to model the different parts of a human or other objects for action recognition [45] and object tracking [8].…”
Section: Graph Neural Networkmentioning
confidence: 99%