A Generative Approach to Zero-Shot and Few-Shot Action Recognition

Mishra, Ashish; Verma, Vinay Kumar; Reddy, M Shiva Krishna; Arulkumar, S; Rai, Piyush; Mittal, Anurag

doi:10.1109/wacv.2018.00047

Cited by 127 publications

(86 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The best existing approach for GZSL action recognition, GGM [24], employs a generative approach to synthesize unseen class data and utilizes unlabelled real features (C3D) from the unseen classes to rectify the bias of the learned parameters towards seen classes. Particularly, for the UCF101 dataset and manual attributes combination, the proposed approach, CEWGAN-OD, achieves gains of 5.1% and 25.8% (in terms of accuracy) over the CLSWGAN [33] and GGM [24], respectively. Further, for the word2vec embedding, the proposed CEWGAN-OD achieves gains of 16% and 19.8% over the best existing approach, GGM [24], for the HMDB51 and UCF101 datasets, respectively.…”

Section: State-of-the-art Comparisonmentioning

confidence: 99%

“…Particularly, for the UCF101 dataset and manual attributes combination, the proposed approach, CEWGAN-OD, achieves gains of 5.1% and 25.8% (in terms of accuracy) over the CLSWGAN [33] and GGM [24], respectively. Further, for the word2vec embedding, the proposed CEWGAN-OD achieves gains of 16% and 19.8% over the best existing approach, GGM [24], for the HMDB51 and UCF101 datasets, respectively. ZSL performance comparison: In Tab.…”

Section: State-of-the-art Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition

Mandal

Narayan

Dwivedi³

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

119

101

View full text Add to dashboard Cite

Generalized zero-shot action recognition is a challenging problem, where the task is to recognize new action categories that are unavailable during the training stage, in addition to the seen action categories. Existing approaches suffer from the inherent bias of the learned classifier towards the seen action categories. As a consequence, unseen category samples are incorrectly classified as belonging to one of the seen action categories. In this paper, we set out to tackle this issue by arguing for a separate treatment of seen and unseen action categories in generalized zeroshot action recognition. We introduce an out-of-distribution detector that determines whether the video features belong to a seen or unseen action category. To train our out-ofdistribution detector, video features for unseen action categories are synthesized using generative adversarial networks trained on seen action category features. To the best of our knowledge, we are the first to propose an outof-distribution detector based GZSL framework for action recognition in videos. Experiments are performed on three action recognition datasets: Olympic Sports, HMDB51 and UCF101. For generalized zero-shot action recognition, our proposed approach outperforms the baseline [33] with absolute gains (in classification accuracy) of 7.0%, 3.4%, and 4.9%, respectively, on these datasets. * Authors contributed equally.

show abstract

Section: State-of-the-art Comparisonmentioning

confidence: 99%

Section: State-of-the-art Comparisonmentioning

confidence: 99%

Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition

Mandal

Narayan

Dwivedi³

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

119

101

View full text Add to dashboard Cite

show abstract

“…In the standard GZSL setting, we improve by 9.3 and 4.9 in over the non-generative model SADLE in [28], for HMDB51 and UCF101, respectively. Generativemodel driven methods GGM [20], f-CLSWGAN [35] and CEWGAN [16] uses unseen class prototypes during training to generate unseen class visual samples. We still outperform GGM, f-CLSWGAN, and deliver comparable performance with CEWGAN.…”

Section: Methodsmentioning

confidence: 99%

Generalized Zero-Shot Learning using Generated Proxy Unseen Samples and Entropy Separation

Gune

Banerjee

Chaudhuri

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

The recent generative model-driven Generalized Zero-shot Learning (GZSL) techniques overcome the prevailing issue of the model bias towards the seen classes by synthesizing the visual samples of the unseen classes through leveraging the corresponding semantic prototypes. Although such approaches significantly improve the GZSL performance due to data augmentation, they violate the principal assumption of GZSL regarding the unavailability of semantic information of unseen classes during training. In this work, we propose to use a generative model (GAN) for synthesizing the visual proxy samples while strictly adhering to the standard assumptions of the GZSL. The aforementioned proxy samples are generated by exploring the early training regime of the GAN. We hypothesize that such proxy samples can effectively be used to characterize the average entropy of the label distribution of the samples from the unseen classes. Further, we train a classifier on the visual samples from the seen classes and proxy samples using entropy separation criterion such that an average entropy of the label distribution is low and high, respectively, for the visual samples from the seen classes and the proxy samples. Such entropy separation criterion generalizes well during testing where the samples from the unseen classes exhibit higher entropy than the entropy of the samples from the seen classes. Subsequently, low and high entropy samples are classified using supervised learning and ZSL rather than GZSL. We show the superiority of the proposed method by experimenting on AWA1, CUB, HMDB51, and UCF101 datasets. CCS CONCEPTS • Computing methodologies → Object recognition.

show abstract

“…We focus on inductive ZSL in which test data is fully unknown at training time. There exists a body of literature on transductive ZSL [1,33,54,55,59,58,60], where test images or videos are available during training but test labels are not. We do not discuss the transductive approach in this work.…”

Section: Related Workmentioning

confidence: 99%

“…To our knowledge, all current ZSL methods for video recognition use pretrained visual embeddings [1,4,18,33,35,54,55,58,59,60,61,64]. This provides a good tradeoff between training efficiency and using prior knowledge.…”

Section: Introductionmentioning

confidence: 99%

Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

Brattoli

Tighe

Zhdanov

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at github.com/bbrattoli/ZeroShotVideoClassification. * Work done during an internship at Amazon.

show abstract

A Generative Approach to Zero-Shot and Few-Shot Action Recognition

Cited by 127 publications

References 34 publications

Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition

Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition

Generalized Zero-Shot Learning using Generated Proxy Unseen Samples and Entropy Separation

Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

Contact Info

Product

Resources

About