Detecting generic, taxonomy-free event boundaries in videos represents a major stride forward towards holistic video understanding. In this paper we present a technique for generic event boundary detection based on a two stream inflated 3D convolutions architecture, which can learn spatiotemporal features from videos. Our work is inspired from the Generic Event Boundary Detection Challenge (part of CVPR 2021 Long Form Video Understanding-LOVEU Workshop). Throughout the paper we provide an in-depth analysis of the experiments performed along with an interpretation of the results obtained. The code for this work can be found at https://github.com/rayush7/GEBD
Self-supervised learning (SSL) approaches have made major strides forward by emulating the performance of their supervised counterparts on several computer vision benchmarks. This, however, comes at a cost of substantially larger model sizes, and computationally expensive training strategies, which eventually lead to larger inference times making it impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweight subnetwork, which usually involves multiple epochs of fine-tuning of a large pre-trained model, making it more computationally challenging.In this work we propose a novel perspective on the interplay between SSL and DC paradigms that can be leveraged to simultaneously learn a dense and gated (sparse/lightweight) sub-network from scratch offering a good accuracy-efficiency tradeoff, and therefore yielding a generic and multipurpose architecture for application specific industrial settings. Our study overall conveys a constructive message: exhaustive experiments on several image classification benchmarks: CIFAR-10, STL-10, CIFAR-100, and ImageNet-100, demonstrates that the proposed training strategy provides a dense and corresponding sparse sub-network that achieves comparable (on-par) performance compared with the vanilla self-supervised setting, but at a significant reduction in computation in terms of FLOPs under a range of target budgets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.