Arulkumar Subramaniam scite author profile

Video based computer vision tasks can benefit from estimation of the salient regions and interactions between those regions. Traditionally, this has been done by identifying the object regions in the images by utilizing pre-trained models to perform object detection, object segmentation and/or object pose estimation. Though using pre-trained models seems to be a viable approach, it is infeasible in practice due to the need for exhaustive annotation of object categories, domain gap between datasets and bias present in pre-trained models. To overcome these downsides, we propose to utilize the common rationale that a sequence of video frames capture a set of common objects and interactions between them, thus a notion of co-segmentation between the video frame features may equip the model with the ability to automatically focus on salient regions and improve underlying task's performance in an end-to-end manner. In this regard, we propose a generic module called "Co-Segmentation Activation Module" (COSAM) that can be plugged-in to any CNN to promote the notion of co-segmentation based attention among a sequence of video frame features. We show the application of COSAM in three video based tasks namely: 1) Video-based person re-ID, 2) Video captioning, & 3) Video action classification and demonstrate that COSAM is able to capture salient regions in the video frames, thus leading to notable performance improvements along with interpretable attention maps.

show abstract

Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features

Subramaniam

Patel

Mishra

et al. 2016

View full text Add to dashboard Cite

We propose a novel approach for First Impressions Recognition in terms of the Big Five personality-traits from short videos. The Big Five personality traits is a model to describe human personality using five broad categories: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We train two bi-modal end-to-end deep neural network architectures using temporally ordered audio and novel stochastic visual features from few frames, without over-fitting. We empirically show that the trained models perform exceptionally well, even after training from a small sub-portions of inputs. Our method is evaluated in ChaLearn LAP 2016 Apparent Personality Analysis (APA) competition using ChaLearn LAP APA2016 dataset and achieved excellent performance.

show abstract

NCC-Net: Normalized Cross Correlation Based Deep Matcher with Robustness to Illumination Variations

Subramaniam

Balasubramanian

Mittal

2018

View full text Add to dashboard Cite

Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features

Subramaniam¹,

Patel²,

Mishra³

et al. 2016

Preprint

View full text Add to dashboard Cite

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

Dutta

Subramaniam

Mittal

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.