Proceedings of the 27th ACM International Conference on Multimedia 2019
DOI: 10.1145/3343031.3351058
|View full text |Cite
|
Sign up to set email alerts
|

Video Relation Detection with Spatio-Temporal Graph

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
71
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 71 publications
(71 citation statements)
references
References 11 publications
0
71
0
Order By: Relevance
“…or tree-near-?. Such intuition has been empirically shown benefits in boosting SGG [62,7,28,30,29,71,20,73,58,13,44,59,45]. More specifically, these methods use a conditional random field [79] to model the joint distribution of nodes and edges, where the context is incorporated by message passing among the nodes through edges via a multi-step meanfield approximation [26]; then, the model is optimized by the sum of cross-entropy (XE) losses of nodes (e.g., objects) and edges (e.g., relationships).…”
Section: Introductionmentioning
confidence: 94%
“…or tree-near-?. Such intuition has been empirically shown benefits in boosting SGG [62,7,28,30,29,71,20,73,58,13,44,59,45]. More specifically, these methods use a conditional random field [79] to model the joint distribution of nodes and edges, where the context is incorporated by message passing among the nodes through edges via a multi-step meanfield approximation [26]; then, the model is optimized by the sum of cross-entropy (XE) losses of nodes (e.g., objects) and edges (e.g., relationships).…”
Section: Introductionmentioning
confidence: 94%
“…The idea of multiple hypothesis is first applied to this task by [1] which generates hypothesis for each object pair when performing association. [16] built a spatio-temporal graph between adjacent video segments and used multiple layers of graph convolutional networks to pass messages between graph nodes. Besides, they proposed an online association method with a siamese network and obtained the stateof-the-art results by combining these two parts.…”
Section: Related Workmentioning
confidence: 99%
“…relational association, which has the greatest difference between relation detection on video and image. The association method in [1] cannot satisfactorily handle various different predicates between each object pair while the siamese network in [16] only adds an appearance similarity score to the original greedy association method but suffers from extra complexity in the training process. In this paper, we differ from the framework of greedy association and propose a brand new effective association method which requires no training process.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations