2020
DOI: 10.1609/aaai.v34i03.5649
|View full text |Cite
|
Sign up to set email alerts
|

Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition

Abstract: Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…In 2020, the long-tail item recommendation method began to the deep learning method. For example, Bai et al [36] use stacked denoising autoencoders (SDAE) to realize online long-tail item recom-mendation, and Kumar et al [12] proposed to realize longtail item recommendation using few shot learning. Bai et al [36] proposed a deep learning framework for long-tail item recommendation (DLTSR).…”
Section: Multiobjective Optimization-based Long-tail Itemmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2020, the long-tail item recommendation method began to the deep learning method. For example, Bai et al [36] use stacked denoising autoencoders (SDAE) to realize online long-tail item recom-mendation, and Kumar et al [12] proposed to realize longtail item recommendation using few shot learning. Bai et al [36] proposed a deep learning framework for long-tail item recommendation (DLTSR).…”
Section: Multiobjective Optimization-based Long-tail Itemmentioning
confidence: 99%
“…The content-based recommendation method and collaborative filtering recommendation method [2,3] are classic methods in the recommender system. Machine learning and deep learning have great advantages in learning the inherent laws and representation levels of sample data and have made many research achievements in image classification [4][5][6][7], object detection [8][9][10][11], speech recognition [12,13], and emotion recognition [14]. Therefore, researchers combine machine learning, deep learning, knowledge graph, and other technologies in these basic methods, allowing recommender systems to be widely used in news, tourism, e-commerce, and other fields.…”
Section: Introductionmentioning
confidence: 99%
“…A related task to video frame interpolation is talking face generation. Here, given an audio waveform, the task is to synthesize a talking face [4,5,6]. In recent times, these approaches have become popular for both academic and non-academic purposes [7].…”
Section: Introductionmentioning
confidence: 99%
“…In recent times, these approaches have become popular for both academic and non-academic purposes [7]. While, on the one hand, they are being used to extend speechreading models to low resource languages [6], on the other, many of them are also used to generate fake news and paid content as well.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation