Audio-Visual Model Distillation Using Acoustic Images

Pérez, Andrés F.; Sanguineti, Valentina; Morerio, Pietro; Murino, Vittorio

doi:10.1109/wacv45572.2020.9093307

Cited by 26 publications

(27 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To satisfy these requirements, knowledge distillation is widely studied and applied in many speech recognition tasks. There are many knowledge distillation systems for designing lightweight deep acoustic models for speech recognition (Chebotar and Waters, 2016;Wong and Gales, 2016;Chan et al, 2015;Price et al, 2016;Fukuda et al, 2017;Bai et al, 2019b;Ng et al, 2018;Albanie et al, 2018;Lu et al, 2017;Shi et al, 2019a;Roheda et al, 2018;Shi et al, 2019b;Gao et al, 2019;Ghorbani et al, 2018;Takashima et al, 2018;Watanabe et al, 2017;Shi et al, 2019c;Asami et al, 2017;Huang et al, 2018;Shen et al, 2018;Perez et al, 2020;Shen et al, 2019a;Oord et al, 2018). In particular, these KD-based speech recognition applications have spoken language identification (Shen et al, 2018(Shen et al, , 2019a), text-independent speaker recognition (Ng et al, 2018), audio classification (Gao et al, 2019;Perez et al, 2020), speech enhancement (Watanabe et al, 2017), acoustic event detection (Price et al, 2016;Shi et al, 2019a,b), speech synthesis (Oord et al, 2018) and so on.…”

Section: Kd In Speech Recognitionmentioning

confidence: 99%

“…Most existing knowledge distillation methods for speech recognition, use teacher-student architectures to improve the efficiency and recognition accuracy of acoustic models (Chan et al, 2015;Chebotar and Waters, 2016;Lu et al, 2017;Price et al, 2016;Shen et al, 2018;Gao et al, 2019;Shen et al, 2019a;Shi et al, 2019c,a;Watanabe et al, 2017;Perez et al, 2020). Using a recurrent neural network (RNN) for holding the temporal information from speech sequences, the knowledge from the teacher RNN acoustic model was transferred into a small student DNN model (Chan et al, 2015).…”

Section: Kd In Speech Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

Knowledge Distillation: A Survey

Gou,

Yu,

Maybank

et al. 2020

Preprint

View full text Add to dashboard Cite

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

show abstract

Section: Kd In Speech Recognitionmentioning

confidence: 99%

Section: Kd In Speech Recognitionmentioning

confidence: 99%

Knowledge Distillation: A Survey

Gou,

Yu,

Maybank

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…c) A model that aims to support the editorial decision process should only assume the availability of human review text during training, and be able to make recommendations in their absence. Inspired by missing modality hallucination methods (Hoffman et al, 2016;Tang et al;Pérez et al, 2020)), we propose a realistic system that uses all available data for training, but imputes review representations at test time based on the abstract text.…”

Section: Contributionsmentioning

confidence: 99%

“…Figure1depicts an overview of our architecture. Inspired by modality hallucination studies(Hoffman et al, 2016;Pérez et al, 2020), we use the abstract module to predict both the abstract h abs i…”

mentioning

confidence: 99%

Uncertainty Aware Review Hallucination for Science Article Classification

Friedl¹,

Rizos²,

Stappen³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

The high subjectivity and costs inherent in peer reviewing have recently motivated the preliminary design of machine learning-based acceptance decision methods. However, such approaches are limited in that they: a) do not explore the usage of both the reviewer and area chair recommendations, b) do not explicitly model subjectivity on a per submission basis, and c) are not applicable in realistic settings, by assuming that review texts are available at test time, when these are exactly the inputs that should be considered to be missing in this application. We propose to utilise methods that model the aleatory uncertainty of the submissions, while also exploring different loss importance interpolations between area chair and reviewers' recommendations. We also propose a modality hallucination approach to impute review representations at test time, providing the first realistic evaluation framework for this challenging task. * * KF and GR contributed equally to this work. of potentially contradicting reviews (Stappen et al., 2020). We adopt the latter, review-aggregating approach, that resembles the editorial process more.

show abstract

“…Alwassel et al [399] proposed a self-supervised method, called Cross-Modal Deep Clustering (XDC), to utilize the semantic correlation and the differences between RGB and audio modalities. In the work of [400], audio deep learning models were trained, and the visual and acoustic images were exploited in a teacher-student fashion.…”

Section: Co-learning With Visual and Sensor Modalitiesmentioning

confidence: 99%