2007
DOI: 10.1016/j.specom.2006.11.005
|View full text |Cite
|
Sign up to set email alerts
|

Linear hidden transformations for adaptation of hybrid ANN/HMM models

Abstract: This paper focuses on the adaptation of Automatic Speech Recognition systems using Hybrid models combining Artificial Neural Networks (ANN) with Hidden Markov Models (HMM). Most adaptation techniques for ANNs reported in the literature consist in adding a linear transformation network connected to the input of the ANN. This paper describes the application of linear transformations not only to the input features, but also to the outputs of the internal layers. The motivation is that the outputs of an internal l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
76
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 140 publications
(79 citation statements)
references
References 18 publications
0
76
0
Order By: Relevance
“…The first category performs speaker normalisation at signal level, such as Vocal Tract Length Normalisation (VTLN [1]), or speaker transformation at feature level, such as feature-MLLR (fMLLR [3]). The second category includes speaker dependent discriminative transformations into DNN structures, for example Linear Input Network (LIN [5]), Linear Output Network (LON [6]), Linear Hidden Layer (LHN [7]) and feature-space Discriminative Linear Regression (fDLR [3]). The third category, "informed DNN training", informs DNNs with meta-information during training process by augmenting the DNN input with auxiliary codes that carry speaker information.…”
Section: Introductionmentioning
confidence: 99%
“…The first category performs speaker normalisation at signal level, such as Vocal Tract Length Normalisation (VTLN [1]), or speaker transformation at feature level, such as feature-MLLR (fMLLR [3]). The second category includes speaker dependent discriminative transformations into DNN structures, for example Linear Input Network (LIN [5]), Linear Output Network (LON [6]), Linear Hidden Layer (LHN [7]) and feature-space Discriminative Linear Regression (fDLR [3]). The third category, "informed DNN training", informs DNNs with meta-information during training process by augmenting the DNN input with auxiliary codes that carry speaker information.…”
Section: Introductionmentioning
confidence: 99%
“…The fisrt two systems in the list are based on a GMM-HMM acoustic models: the first one was trained using the maximum-likelihood (ML) criteria [17], while the second one using the maximum mutual information (MMI) criteria with the vocal tract length normalization (VTLN) [16].Triefenbach et al [17], proposed also a Reservoir Computing (RC) HMM hybrid system for phoneme recognition using a bigram phonotactic utterance model. The RC-HMM performs significantly better than the MLP-HMM hybrids proposed by Gemello et al [19]. However, it is still outperformed by the GMM system with VTLN.…”
Section: Read Continuous Speech Recognition Taskmentioning
confidence: 73%
“…Model transformation based adaptation [103,36,80,150] In model transformation approach, the network is augmented with an affine transformation network to the input layer, hidden layer or the output layer. + similarity graph embedding 27.66% 27.60% Table 6.14: Similarity graph embedding features for LSTM-CTC systems on SVB-10k.…”
Section: Connections To Speaker Adaptationmentioning
confidence: 99%