2022
DOI: 10.48550/arxiv.2204.01450
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding

Abstract: Grounding temporal video segments described in natural language queries effectively and efficiently is a crucial capability needed in vision-and-language fields. In this paper, we deal with the fast video temporal grounding (FVTG) task, aiming at localizing the target segment with high speed and favorable accuracy. Most existing approaches adopt elaborately designed cross-modal interaction modules to improve the grounding performance, which suffer from the test-time bottleneck. Although several common space-ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 55 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?