Piyush Bagad scite author profile

Modeling and understanding time remains a challenge in contemporary video understanding models. With language emerging as a key driver towards powerful generalization, it is imperative for foundational video-language models to have a sense of time. In this paper, we consider a specific aspect of temporal understanding: consistency of time order as elicited by before/after relations. We establish that six existing video-language models struggle to understand even such simple temporal relations. We then question whether it is feasible to equip these foundational models with temporal awareness without re-training them from scratch. Towards this, we propose a temporal adaptation recipe on top of one such model, VideoCLIP, based on post-pretraining on a small amount of video-text data. We conduct a zero-shot evaluation of the adapted models on six datasets for three downstream tasks which require a varying degree of time awareness. We observe encouraging performance gains especially when the task needs higher time awareness. Our work serves as a first step towards probing and instilling a sense of time in existing video-language models without the need for data and compute-intense training from scratch.

show abstract

C-3PO: Towards Rotation Equivariant Feature Detection and Description

Bagad

Eijkelboom

Fokkema

et al. 2023

View full text Add to dashboard Cite

Data-Sharing Economy: Value-Addition from Data meets Privacy

Bagad

Mitra²,

Dhamnani³

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Piyush Bagad

How Severe Is Benchmark-Sensitivity in Video Self-supervised Learning?

Test of Time: Instilling Video-Language Models with a Sense of Time

C-3PO: Towards Rotation Equivariant Feature Detection and Description

Data-Sharing Economy: Value-Addition from Data meets Privacy

Contact Info

Product

Resources

About