2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404785
|View full text |Cite
|
Sign up to set email alerts
|

Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation

Abstract: This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition. Our work focuses on transcription of multi-genre broadcast media, which is often only categorised broadly in terms of high level genres such as sports, news, documentary, etc. However, in terms of acoustic modelling these categories are coarse. Instead, it is expected that a mixture of latent domains can better represent the comp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2

Relationship

4
3

Authors

Journals

citations
Cited by 10 publications
(17 citation statements)
references
References 28 publications
0
17
0
Order By: Relevance
“…All of the in-domain data was used for training the aLDA model with the procedure described in section 2 using a vocabulary size of 1024 (number of Gaussian mixture components) and 2048 latent domains. Both these values were selected based on our previous experiments [4,5]. The trained aLDA model was then used to get the posterior Dirichlet parameter γ for all of the utterances in the training, dev and test set.…”
Section: Alda Data Selectionmentioning
confidence: 99%
See 1 more Smart Citation
“…All of the in-domain data was used for training the aLDA model with the procedure described in section 2 using a vocabulary size of 1024 (number of Gaussian mixture components) and 2048 latent domains. Both these values were selected based on our previous experiments [4,5]. The trained aLDA model was then used to get the posterior Dirichlet parameter γ for all of the utterances in the training, dev and test set.…”
Section: Alda Data Selectionmentioning
confidence: 99%
“…* Core part of this work was performed while the author was studying at the University of Sheffield In this paper we propose to use acoustic Latent Dirichlet Allocation (aLDA) for matching acoustically similar data to the limited in-domain data from a pool of diverse data. aLDA is already applied for domain discovery [3] and domain adaptation [4] in automatic speech recognition as well as media entity recognition, such as show and genre identification in information retrieval systems for media archives [5,6,7].…”
Section: Introductionmentioning
confidence: 99%
“…Kim et al [8] used the whole shows to train the LDA models and used the domain posteriors as features for an SVM classifier. In this work we followed our previous setup [18,21] where only speech segments are used to train the LDA model. For each show, the domain posteriors of its segments were accumulated and length normalised and used as features for the discriminative classifier in the later stage: It is important to note that this dataset is by orders of magnitude larger than most of the datasets used in the literature for the genre ID task [2,4,5,6,8,15].…”
Section: Acoustic Latent Dirichlet Allocationmentioning
confidence: 99%
“…It has been found that feature-based adaptation of RNNLMs by augmenting the input with domain-specific auxiliary features provide significant improvements in both perplexity (PPL) and word error rate (WER) [8,2,9,10,4,6,11]. Such features, however, can also include acoustic embeddings [12,13] derived from audio, which might be available for only a subset of the text data, such as the matched in-domain data used for finetuning. In such cases, semi-supervised adaptation approaches can be used.…”
Section: Introductionmentioning
confidence: 99%