2018
DOI: 10.1007/978-3-030-00767-6_32
|View full text |Cite
|
Sign up to set email alerts
|

VAL: Visual-Attention Action Localizer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
21
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(21 citation statements)
references
References 14 publications
0
21
0
Order By: Relevance
“…Another way is to use the surrounding clips as the local context for a moment. Gao et al, Liu et al, Song et al and Ge et al concatenate the moment feature with clip features before and after current clip as its representation [5], [11]- [13]. Since these methods only consider one or two specific moments, the rich context information from other possible moments is ignored.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Another way is to use the surrounding clips as the local context for a moment. Gao et al, Liu et al, Song et al and Ge et al concatenate the moment feature with clip features before and after current clip as its representation [5], [11]- [13]. Since these methods only consider one or two specific moments, the rich context information from other possible moments is ignored.…”
Section: Related Workmentioning
confidence: 99%
“…The key idea of cross-modal attention is to attend relevant video clips/moments or query words from another modality. Some methods attend relevant video features through words [12], [28], while most other methods attend both the relevant video features and words via the co-attention module [13], [16], [17], [19], [20], [23], [25], [27], [28], [35], [37], [38]. For sentence syntactic modeling, Zhang et al [17] enhance the sentence modeling with the queries' syntactic graph.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One way is to use the whole video as the global context. Specifically (Gao et al 2017;Liu et al 2018b;Song and Han 2018;Ge et al 2019). Since these methods model the context with a one-dimension sliding window, the moments longer than the window would be ignored.…”
Section: Related Workmentioning
confidence: 99%
“…Video moment localization with natural language has a wide range of applications, such as video question answering (Lei et al 2018), video content retrieval (Shao et al 2018), as well as video storytelling (Gella, Lewis, and Rohrbach 2018). Most of the current language-queried moment localization models follow a two-step pipeline (Gao et al 2017;Hendricks et al 2017;Ge et al 2019;Liu et al 2018b;Song and Han 2018). Moment candidates are first selected from the input video with sliding windows.…”
Section: Introductionmentioning
confidence: 99%