ASR Performance Prediction on Unseen Broadcast Programs Using Convolutional Neural Networks

Elloumi, Zied; Besacier, Laurent; Galibert, Olivier; Kahn, Juliette; Lecouteux, Benjamin

doi:10.1109/icassp.2018.8461751

Cited by 6 publications

(26 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we attempt to understand what our best ASR performance prediction system (Elloumi et al, 2018) learned. We analyze the text and speech representations obtained by our architecture.…”

Section: Evaluating Learned Representations 41 Methodologymentioning

confidence: 99%

“…In (Elloumi et al, 2018), we proposed a new approach using convolution neural networks (CNNs) to predict ASR performance from a collection of heterogeneous broadcast programs (both radio and TV). We particularly focused on the combination of text (ASR transcription) and signal (raw speech) inputs which both proved useful for CNN prediction.…”

Section: Asr Performance Prediction Systemmentioning

confidence: 99%

“…The TEST set contains unseen broadcast programs that are different from those present in TRAIN and DEV (Elloumi et al, 2018). Tables 1 and 2 show the whole data set in terms of speech turns available for each classification task.…”

Section: Datamentioning

confidence: 99%

“…Our goal is to better understand which information is captured by the deep model and its relation with conditioning factors such as speech style, accent or broadcast program type. For this, we use a data set presented in (Elloumi et al, 2018) which contains a large amount of speech utterances taken from various collections of French broadcast programs. Following a methodology similar to (Belinkov and Glass, 2017), our deep performance prediction model is used to generate utterance level features that are given to a shallow classifier trained to solve secondary classification tasks.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

Elloumi¹,

Besacier²,

Galibert³

et al. 2018

Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information at training time through multi-task learning. Our experiments show that this allows to train slightly more efficient ASR performance prediction systems that -in addition -simultaneously tag the analyzed utterances according to their speech style, accent and broadcast program origin.

show abstract

Section: Evaluating Learned Representations 41 Methodologymentioning

confidence: 99%

Section: Asr Performance Prediction Systemmentioning

confidence: 99%

Section: Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

Elloumi¹,

Besacier²,

Galibert³

et al. 2018

Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

show abstract

“…ASR system performance 3.3.1 ASR system. The ASR system used in this work is described in [13]. It uses the KALDI toolkit [25], following a standard Kaldi recipe.…”

Section: Gender Bias Evaluation Procedures Of Anmentioning

confidence: 99%

Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

Garnerin

Rossato

Besacier

2019

Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery

Self Cite

View full text Add to dashboard Cite

This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.

show abstract

Dissecting Span Identification Tasks with Performance Prediction

Papay¹,

Klinger²,

Padó³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Span identification (in short, span ID) tasks such as chunking, NER, or code-switching detection, ask models to identify and classify relevant spans in a text. Despite being a staple of NLP, and sharing a common structure, there is little insight on how these tasks' properties influence their difficulty, and thus little guidance on what model families work well on span ID tasks, and why. We analyze span ID tasks via performance prediction, estimating how well neural architectures do on different tasks.Our contributions are: (a) we identify key properties of span ID tasks that can inform performance prediction; (b) we carry out a large-scale experiment on English data, building a model to predict performance for unseen span ID tasks that can support architecture choices; (c), we investigate the parameters of the meta model, yielding new insights on how model and task properties interact to affect span ID performance. We find, e.g., that span frequency is especially important for LSTMs, and that CRFs help when spans are infrequent and boundaries non-distinctive.

show abstract

ASR Performance Prediction on Unseen Broadcast Programs Using Convolutional Neural Networks

Cited by 6 publications

References 13 publications

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

Dissecting Span Identification Tasks with Performance Prediction

Contact Info

Product

Resources

About