A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

Lan, Xiaohan; Yuan, Yitian; Wang, Xin; Chen, Long; Wang, Zhi; Ma, Lin; Zhu, Wenwu

doi:10.48550/arxiv.2203.05243

2022

DOI: 10.48550/arxiv.2203.05243

|View full text |Cite

Preprint

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

Xiaohan Lan¹,

Yitian Yuan²,

Xin Wang³

et al.

Abstract: Temporal Sentence Grounding in Videos (TSGV), which aims to ground a natural language sentence that indicates complex human activities in an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found that current benchmark datasets may have obvious moment annotation biases, enabling several simple baselines even without training to achieve state-of-the-art (SOTA) performance. In this paper, we take a closer look at existing evaluation protocols for TSGV, and fin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some other methods attempt to view the VG problem from the perspective of causality and debias the base model with causal intervention(Yang et al 2021;Lan et al 2022;Bao and Mu 2022). To the best of our knowledge, our method is the first to adopt curriculum learning-based data augmentation for debiased video grounding, which is orthogonal to existing methods.…”

mentioning

confidence: 99%

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Lan,

Yuan,

Chen

et al. 2023

AAAI

View full text Add to dashboard Cite

Video Grounding (VG) aims to locate the desired segment from a video given a sentence query. Recent studies have found that current VG models are prone to over-rely the groundtruth moment annotation distribution biases in the training set. To discourage the standard VG model's behavior of exploiting such temporal annotation biases and improve the model generalization ability, we propose multiple negative augmentations in a hierarchical way, including cross-video augmentations from clip-/video-level, and self-shuffled augmentations with masks. These augmentations can effectively diversify the data distribution so that the model can make more reasonable predictions instead of merely fitting the temporal biases. However, directly adopting such data augmentation strategy may inevitably carry some noise shown in our cases, since not all of the handcrafted augmentations are semantically irrelevant to the groundtruth video. To further denoise and improve the grounding accuracy, we design a multi-stage curriculum strategy to adaptively train the standard VG model from easy to hard negative augmentations. Experiments on newly collected Charades-CD and ActivityNet-CD datasets demonstrate our proposed strategy can improve the performance of the base model on both i.i.d and o.o.d scenarios.

show abstract

mentioning

confidence: 99%

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Lan,

Yuan,

Chen

et al. 2023

AAAI

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

Cited by 1 publication

References 46 publications

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Contact Info

Product

Resources

About