Proceedings of the 27th ACM International Conference on Multimedia 2019
DOI: 10.1145/3343031.3351044
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization

Abstract: Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizing the video-level classification loss, which exploits the most discriminative parts of actions but ignores the min… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(13 citation statements)
references
References 28 publications
0
13
0
Order By: Relevance
“…With the rapid development of artificial intelligence techniques [18], [19], [20], [21], great progress has been made in many isolated applications such as causal inference [22], named entities identification [23], question answering [24], scene text spotting [5], [6], [17] and video understanding [25], [26]. However, it is very important to build multiple knowledge representation [27] for understanding the real and complex world.…”
Section: Related Workmentioning
confidence: 99%
“…With the rapid development of artificial intelligence techniques [18], [19], [20], [21], great progress has been made in many isolated applications such as causal inference [22], named entities identification [23], question answering [24], scene text spotting [5], [6], [17] and video understanding [25], [26]. However, it is very important to build multiple knowledge representation [27] for understanding the real and complex world.…”
Section: Related Workmentioning
confidence: 99%
“…To handle the two problems, existing methods can be divided into three types. The first type of works attempt to solve the localization completeness by applying a well-designed erasing strategy [37,55,53] or a multi-branch architecture [21]. For example, Zhong et al [55] design a stepby-step erasion approach to train the one-by-one classifiers, via collecting detection results from these classifiers, more action segments are found.…”
Section: Related Workmentioning
confidence: 99%
“…To relieve this problem, the weakly supervised setting that only requires video-level category labels is proposed [37,55,39,37,55,53,29,30,45,46]. It can be formulated as a multiple instance learning problem, where a video is treated as a bag of multiple segments and fed into a video-level classifier to get a class activation sequence (CAS).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The past decade has witnessed the great efforts in action understanding [1,2,3,4,5], among which action detection is receiving the most considerable attention [6,7,8,9,10,11,12]. Action detection targets predicting if an action occurs in a video that has its complete observation; meanwhile, finding the relevant spatial-temporal location.…”
Section: Introductionmentioning
confidence: 99%