2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
DOI: 10.1109/icassp.2000.859128
|View full text |Cite
|
Sign up to set email alerts
|

Fast speaker adaptation of large vocabulary continuous density HMM speech recognizer using a basis transform approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…An extension of the MLLR technique, where multiple independent transformations are combined, was proposed by Digalakis and colleagues [21,22,46,47]. For the purpose of speaker adaptation on the training material first adaptation transforms are estimated for typical speaker groups.…”
Section: ])mentioning
confidence: 99%
“…An extension of the MLLR technique, where multiple independent transformations are combined, was proposed by Digalakis and colleagues [21,22,46,47]. For the purpose of speaker adaptation on the training material first adaptation transforms are estimated for typical speaker groups.…”
Section: ])mentioning
confidence: 99%
“…A first version of BT is presented in Boulis and Digalakis (2000). The algorithm was evaluated on the Swedish ATIS corpus where the speaker-independent system is trained on non-dialect speakers.…”
Section: The Basis Transformation Approachmentioning
confidence: 99%
“…A closely related problem to nonnative speaker adaptation is regional dialect speaker adaptation. Digalakis et al investigated adapting acoustic models to fit speakers with dialect accents [12], [13]. In [12], Maximum Likelihood Stochastic Transformation (MLST) was proposed to estimate multiple linear transforms for each model cluster in model adaptation.…”
Section: Introductionmentioning
confidence: 99%
“…Although a significant performance improvement was achieved, much more data than that of MLLR were needed, where only one linear transform was estimated for each model cluster in MLLR. In [13], in order to achieve a good performance when adaptation data were sparse, speech data of prototype speakers from target dialect regions were used to generate a set of basis linear transformations and a small amount of new speaker's speech was used to estimate the transform combination weights. In their experiments of Swedish dialect speaker adaptation, the adaptation performance exceeded that of MLLR greatly when the amount of adaptation data was very small.…”
Section: Introductionmentioning
confidence: 99%