Proceedings of the 25th ACM International Conference on Multimedia 2017
DOI: 10.1145/3123266.3123380
|View full text |Cite
|
Sign up to set email alerts
|

Video Visual Relation Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
144
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 115 publications
(144 citation statements)
references
References 34 publications
0
144
0
Order By: Relevance
“…One of the key challenges of learning relationships in videos has been the lack of relevant annotated datasets. In this context, the recent work of [29] is inspiring as it contributes manually annotated relations for the ImageNet video dataset. Our work improves upon [29] on multiple fronts: (1) Instead of assuming no temporal contingency between relationships, we introduce a gated fully-connected spatio-temporal energy graph for modeling the inherently rich structure from videos; (2) We extend the study of relation triplet from subject/predicate/object to a more general setting, such as object/verb/scene [32]; (3) We consider a new task 'relation recognition' (apart from relation detection and tagging) which requires the model to make predictions in a fine-grained manner; (4) For various metrics and tasks, our model demonstrates improved performance.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…One of the key challenges of learning relationships in videos has been the lack of relevant annotated datasets. In this context, the recent work of [29] is inspiring as it contributes manually annotated relations for the ImageNet video dataset. Our work improves upon [29] on multiple fronts: (1) Instead of assuming no temporal contingency between relationships, we introduce a gated fully-connected spatio-temporal energy graph for modeling the inherently rich structure from videos; (2) We extend the study of relation triplet from subject/predicate/object to a more general setting, such as object/verb/scene [32]; (3) We consider a new task 'relation recognition' (apart from relation detection and tagging) which requires the model to make predictions in a fine-grained manner; (4) For various metrics and tasks, our model demonstrates improved performance.…”
Section: Related Workmentioning
confidence: 99%
“…Evaluation for different methods on ImageNet Video dataset. * denotes the re-implementation of [29] after fixing the bugs in their released method code (by contacting authors). † denotes the implementation with additional triplet loss term for language priors [20].…”
Section: Inference Message Passing and Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Nonetheless, all of them were curated only based on textual resources while neglecting the rich information existing in visual data. Thereafter, many efforts have been paid to extracting knowledge from visual data, such as NEIL [4], Visual Genome [13] and VidVRD [21]. Even though many researches targeted at extracting knowledge from both textual and visual data, few works aim to extract knowledge in vertical domains like fashion.…”
Section: Related Workmentioning
confidence: 99%