Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413841
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Video Moment Retrieval by Jointly Modeling Ranking and Localization

Abstract: Retrieving video moments from an untrimmed video given a natural language as the query is a challenging task in both academia and industry. Although much effort has been made to address this issue, traditional video moment ranking methods are unable to generate reasonable video moment candidates and video moment localization approaches are not applicable to large-scale retrieval scenario. How to combine ranking and localization into a unified framework to overcome their drawbacks and reinforce each other is ra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(12 citation statements)
references
References 35 publications
(50 reference statements)
0
12
0
Order By: Relevance
“…21). AVMR [129] treats RL-based module as a generator, and devises a Bayesian ranking module as discriminator to rank proposals. STRONG [130] considers both appearance and motion features, and employs parallel spatial-level and temporal-level RL modules for moment localization.…”
Section: Reinforcement Leaning-based Methodsmentioning
confidence: 99%
“…21). AVMR [129] treats RL-based module as a generator, and devises a Bayesian ranking module as discriminator to rank proposals. STRONG [130] considers both appearance and motion features, and employs parallel spatial-level and temporal-level RL modules for moment localization.…”
Section: Reinforcement Leaning-based Methodsmentioning
confidence: 99%
“…To align the text and video, Hahn et al [68] used gated attention [69] showing their model achieves faster localization than the traditional one. Cao et al [70] proposed an adversarial learning paradigm combining reinforcement learning and moment ranking to solve the NLVL problem. In this way, it is more easier to learn the differences between moments within the same video and can select target segment more efficiently.…”
Section: A Supervised Methodsmentioning
confidence: 99%
“…The action space for each step is a set of handcraft-designed temporal transformations (e.g., shifting, scaling). The typical methods include R-W-M [22], SM-RL [62], TripNet [21], STRONG [2], TSP-PRL [65] and AVMR [3].…”
Section: Reinforcement Learning-based Methodsmentioning
confidence: 99%
“…The alignment network will predict a confidence score to determine when to stop. Meanwhile, AVMR [3] addresses TSGV under the adversarial learning paradigm, which designs a RL-based proposal generator to generate proposal candidates and employs Bayesian Personalized Ranking as a discriminator to rank these generated moment proposals in a pairwise manner.…”
mentioning
confidence: 99%