2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178782
|View full text |Cite
|
Sign up to set email alerts
|

An investigation into speaker informed DNN front-end for LVCSR

Abstract: Deep Neural Network (DNN) has become a standard method in many ASR tasks. Recently there is considerable interest in "informed training" of DNNs, where DNN input is augmented with auxiliary codes, such as i-vectors, speaker codes, speaker separation bottleneck (SSBN) features, etc. This paper compares different speaker informed DNN training methods in LVCSR task. We discuss mathematical equivalence between speaker informed DNN training and "bias adaptation" which uses speaker dependent biases, and give detaile… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
6
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
2
1

Relationship

5
3

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 23 publications
2
6
0
Order By: Relevance
“…The non-interpolated training PPL for LM 2 RNNLM was found to be 101.4 using the K-component topic-based finetuning compared to 106.7 for using text LDA adaptation and 119.2 when not using adaptation (baseline), whereas the non-interpolated test PPL was: 157.2, 136.4 and 150.8 respectively, which shows overfitting of the K-Component RNNLM. Our results with the introduction of a domain-specific adaptation layer showed that using an adaptation layer with additive bias adaptation (feature-based adaptation layer), better results are obtained than when using a multiplicative transform (LHN adaptation layer), which is in line with similar observations in acoustic modelling [52]. For LM 1&LM 2 RNNLMs, an additive transform gives a WER of 28.7% whereas using a multiplicative transform leads to a WER of 28.9%.…”
Section: ) Feature-based Adaptation Resultssupporting
confidence: 87%
“…The non-interpolated training PPL for LM 2 RNNLM was found to be 101.4 using the K-component topic-based finetuning compared to 106.7 for using text LDA adaptation and 119.2 when not using adaptation (baseline), whereas the non-interpolated test PPL was: 157.2, 136.4 and 150.8 respectively, which shows overfitting of the K-Component RNNLM. Our results with the introduction of a domain-specific adaptation layer showed that using an adaptation layer with additive bias adaptation (feature-based adaptation layer), better results are obtained than when using a multiplicative transform (LHN adaptation layer), which is in line with similar observations in acoustic modelling [52]. For LM 1&LM 2 RNNLMs, an additive transform gives a WER of 28.7% whereas using a multiplicative transform leads to a WER of 28.9%.…”
Section: ) Feature-based Adaptation Resultssupporting
confidence: 87%
“…Making complex ASR systems available was originally the intent of webASR, and, as such, ASR remains the main task in 3 newly developed systems covering 3 domains. All of them present a state-of-the-art speech transcription system, based on the latest research carried out at the University of Sheffield in topics such as Deep Neural Network (DNN) acoustic modelling [11,12,13,14], distant microphone recognition [15], adaptation to noisy environments [16,17,18], domain adaptation [19,20], Recurrent Neural Network (RNN) language modelling [21], Nbest re-ranking [22,23] and sentence-end detection [24,25].…”
Section: Transcription Systemsmentioning
confidence: 99%
“…Finally, domain information derived from the LDA model with K domains is encoded with a K-dimensional one-hot vector called Unique Binary Index Code (UBIC) [14]. UBIC indicates the most likely domain of the utterance using the posterior domain probability.…”
Section: Lda-dnn Adaptationmentioning
confidence: 99%
“…Subspace methods: a speaker/environment subspace is estimated and then neurons' weights or transformations are computed, based on the subspace representation of the speaker/environment. Principle Component Analysis (PCA) based adaptation approach [12], i-Vector based speaker-aware training [13] or speaker-aware DNNs [14] can be considered as subspace methods.…”
Section: Introductionmentioning
confidence: 99%