2018
DOI: 10.1007/978-3-030-01228-1_25
|View full text |Cite
|
Sign up to set email alerts
|

Videos as Space-Time Region Graphs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
512
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 631 publications
(513 citation statements)
references
References 64 publications
1
512
0
Order By: Relevance
“…3D-Conv based Methods. 3D-Conv based methods include I3D [2], Nonlocal I3D [34] and the previous state-of-the-art Non-local I3D + GCN [35]. Although Non-local I3D + GCN leverages multiple techniques including extra data (MSCOCO), extra spatio-temporal features (I3D), its performance is still inferior to ours.…”
Section: Comparison With Different Convolutions On Something-somethinmentioning
confidence: 99%
“…3D-Conv based Methods. 3D-Conv based methods include I3D [2], Nonlocal I3D [34] and the previous state-of-the-art Non-local I3D + GCN [35]. Although Non-local I3D + GCN leverages multiple techniques including extra data (MSCOCO), extra spatio-temporal features (I3D), its performance is still inferior to ours.…”
Section: Comparison With Different Convolutions On Something-somethinmentioning
confidence: 99%
“…Gilmer et al [14] Later formulated the message passing module in GNNs as a learnable neural network. Recently, GNNs have been successfully applied in many fields, including molecular biology [14], computer vision [48,71,76], machine learning [62] and natural language processing [2]. Another popular trend in GNNs is to generalize the convolutional architecture over arbitrary graph-structured data [10,40,26], which is called graph convolution neural network (GCNN).…”
Section: Graph Neural Networkmentioning
confidence: 99%
“…One limitation of this approach is that objects are detected by an object detector pre-trained on extra training data with a closed vocabulary. In contrast to [47], we build a category-agnostic relation module to detect any context that is highly related to the action in a weaklysupervised manner, without the need of object detectors or extra ... annotations. The most closely related work is [40], which treats each location in the feature map of the image as an object proxy and computes actor-object relation feature maps via an additional convolutional layer.…”
Section: Related Workmentioning
confidence: 99%