CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization

Kang, Hyolim; Kim, Kyungmin; Ko, Yu-Min; Kim, Seon Joo

doi:10.1109/iccv48922.2021.01347

Cited by 5 publications

(21 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the proliferation of video platforms, video understanding tasks are drawing substantial attention in computer vision community. The prevailing convention for video processing [3,15,16,21,28] is still dividing the whole video into short non-overlapping snippets with a fixed duration, which neglects the semantic continuity of the video. On the other hand, cognitive scientists have observed that human senses the visual stream as a set of events [39], which alludes that there is room for research to find out a video parsing method * equal contribution, ordered by surname that preserves semantic validity and interpretability of video snippets.…”

Section: Introductionmentioning

confidence: 99%

UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection

Kang

Kim

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Generic Event Boundary Detection (GEBD) is a newly suggested video understanding task that aims to find one level deeper semantic boundaries of events. Bridging the gap between natural human perception and video understanding, it has various potential applications, including interpretable and semantically valid video parsing. Still at an early development stage, existing GEBD solvers are simple extensions of relevant video understanding tasks, disregarding GEBD's distinctive characteristics. In this paper, we propose a novel framework for unsupervised/supervised GEBD, by using the Temporal Self-similarity Matrix (TSM) as the video representation. The new Recursive TSM Parsing (RTP) algorithm exploits local diagonal patterns in TSM to detect boundaries, and it is combined with the Boundary Contrastive (BoCo) loss to train our encoder to generate more informative TSMs. Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin in GEBD benchmark. Especially, our unsupervised method outperforms the previous state-of-the-art "supervised" model, implying its exceptional efficacy.

show abstract

Section: Introductionmentioning

confidence: 99%

UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection

Kang

Kim

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The task has recently been received tremendous attention for from researchers [4,20,26,40,22,5,37,41,23,44,9,6,27,29,35,11,39,2]. Detailed approaches related to TAL can be found in the survey article [17]. One problem of TAL which prevents the model from the real-time application is that the model is allowed to exploit the future frames which is not suitable for real application.…”

Section: Temporal Action Localizationmentioning

confidence: 99%

“…In other words, the models are allowed to access the whole frames in a video so that they can take the relationships between all frames into account and apply the post-processing techniques such as Non-maximum suppression (NMS). However, they are inherently impractical for real-world applications such as live sports broadcasting where the frames are sequentially provided and the future Recently, online temporal action localization (On-TAL) has been introduced, incorporating TAL into streaming videos [17]. In the online setting, the model is not allowed to access future frames and can take the past and current frames only.…”

Section: Introductionmentioning

confidence: 99%

“…NMS) also can not be applied to the output of the model, making On-TAL more challenging. The prior work [17] has augmented context information into the actionness grouping process by formulating On-TAL as a Markov Decision Process (MDP). They have successfully incorporated On-TAL into Q-imitation learning and provided a good baseline for On-TAL.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we propose a simple framework for On-TAL, termed SimOn, that formulates On-TAL as a sequential prediction problem and learns to immediately predict action instances from a few past information using Transformers [31] in an end-to-end manner. Unlike the prior work [17] where the outputs of the OAD models are sequentially given as the contexts for decision-making, we lever-age past visual information as a long-term context and introduce a learnable context embedding as a short-term context to take the relations of consecutive two frames into account. Since we use only a few past frame information, our method can be carried out with high computational and memory efficiency.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

SimOn: A Simple Framework for Online Temporal Action Localization

Tang¹,

Park²,

Kim³

et al. 2022

Preprint

View full text Add to dashboard Cite

Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging. In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer. Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods, achieving a new state-of-the-art On-TAL performance. In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting. The code is available at https://github.com/TuanTNG/SimOn.

show abstract