2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 2007
DOI: 10.1109/icassp.2007.367029
|View full text |Cite
|
Sign up to set email alerts
|

Modelling Pronunciation Variation using Multi-Path HMMS for Syllables

Abstract: The following full text is a publisher's version.For additional information about this publication click this link. http://hdl.handle.net/2066/44459Please be advised that this information was generated on 2024-06-02 and may be subject to change. Article 25fa End User AgreementThis publication is distributed under the terms of Article 25fa of the Dutch Copyright Act. This article entitles the maker of a short scientific work funded either wholly or partially by Dutch public funds to make that work publicly avai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2007
2007
2011
2011

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…In general, pronunciation variants can either be extracted from a large corpus that has already been transcribed manually at the segmental level (data-driven approach, e.g., Hämäläinen et al, 2007;Kessens et al, 2003) or they can be generated by applying a set of rules, proposed in the phonological/phonetic literature, to the canonical forms in the lexicon (knowledge-based approach, e.g., Van Bael et al, 2007b). The set of variants derived with a data-driven approach depends on the corpus from which the variants are extracted and tends to contain fewer pronunciation variants for most words than a lexicon created with the knowledgebased approach.…”
Section: Generation Of Pronunciation Variantsmentioning
confidence: 99%
“…In general, pronunciation variants can either be extracted from a large corpus that has already been transcribed manually at the segmental level (data-driven approach, e.g., Hämäläinen et al, 2007;Kessens et al, 2003) or they can be generated by applying a set of rules, proposed in the phonological/phonetic literature, to the canonical forms in the lexicon (knowledge-based approach, e.g., Van Bael et al, 2007b). The set of variants derived with a data-driven approach depends on the corpus from which the variants are extracted and tends to contain fewer pronunciation variants for most words than a lexicon created with the knowledgebased approach.…”
Section: Generation Of Pronunciation Variantsmentioning
confidence: 99%
“…Previous research into speech recognition using syllables has been mostly based on replacing or combining the triphone-HMMs with models of the same type for syllables (Hetjmánek and Pavelka 2008;Hämäläinen et al 2007;Han et al 2006;Sethy et al 2003;Ogata and Ariki 2003;Ganapathiraju et al 2001;Ahadi 2000;Wu et al 1998;Dupont and Bourlard 1997). Where large vocabulary tasks have been attempted, data sparsity has been fixed by backoff to phone sequences (Sethy et al 2003;Ganapathiraju et al 2001).…”
Section: Syllable Acoustic Modelsmentioning
confidence: 98%
“…In fact, some of the retrained parallel paths were still closely related to the MDVs used to initialise them. In Hämäläinen et al (2007b), we carried out a forced alignment of the training data with the multi-path mixed-model recogniser and analysed the training tokens assigned to each path of the syllable models. Our analysis showed that the token-to-path assignment was clearly related to the articulatory similarity -or dissimilarity -between the transcriptions of the training tokens and the MDVs used to initialise the parallel paths.…”
Section: Discussionmentioning
confidence: 99%
“…Three parallel paths per syllable appeared a good compromise between too little training data and too small a distance between the triphone sequences used to initialise the paths. Our assumption about the optimal number of paths could later be verified by carrying out a forced alignment of the training data with the syllable models; the majority of the paths were frequently entered (Hämäläinen et al, 2007b). In addition, removing the paths that were rarely used during the forced alignment showed that the recognition performance remained virtually unchanged .…”
Section: 2selection Of Major Distinct Transcription Variants For mentioning
confidence: 99%
See 1 more Smart Citation