Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413840
|View full text |Cite
|
Sign up to set email alerts
|

STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(27 citation statements)
references
References 36 publications
0
26
0
Order By: Relevance
“…The action space for each step is a set of handcraft-designed temporal transformations (e.g., shifting, scaling). The typical methods include R-W-M [22], SM-RL [62], TripNet [21], STRONG [2], TSP-PRL [65] and AVMR [3].…”
Section: Reinforcement Learning-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The action space for each step is a set of handcraft-designed temporal transformations (e.g., shifting, scaling). The typical methods include R-W-M [22], SM-RL [62], TripNet [21], STRONG [2], TSP-PRL [65] and AVMR [3].…”
Section: Reinforcement Learning-based Methodsmentioning
confidence: 99%
“…Wang et al [62] propose an RNN-based RL model which sequentially observes a selective set of video frames and finally obtains the temporal boundaries given the query. Cao et al [2] firstly leverage the spatial scene tracking task, which utilizes a spatial-level RL for filtering out the information that is not relevant to the text query. The spatial-level RL can enhance the temporallevel RL for adjusting the temporal boundaries of the video.…”
mentioning
confidence: 99%
“…The task localizes a video segment by a distinct and describable sentence from a video. One kind of dichotomy is late-fusion [1] and early-fusion [6,12,13,23,24,28,34,43,45]. Late-fusion approach computes offline query-agnostic video feature while early-fusion approach computes query-aware video features.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning has achieved great success in the filed of multimedia [5,9,17,27] in recent years, due to the advanced learning ability of models, the growing computing capability of machines, and the availability of big data. A learning model fed with sufficient, highquality data is likely to yield more accurate results.…”
Section: Introductionmentioning
confidence: 99%