2005
DOI: 10.1016/j.specom.2005.03.004
|View full text |Cite
|
Sign up to set email alerts
|

Pronunciation modeling using a finite-state transducer representation

Abstract: The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper details our modeling approach and demonstrates its benefits and weaknesses, both conceptually and empirically, using the recognizer for our jup… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
24
0

Year Published

2007
2007
2016
2016

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 32 publications
(24 citation statements)
references
References 18 publications
0
24
0
Order By: Relevance
“…In particular, the interactions between subword modeling, observation modeling, and the choice of acoustic observations deserve more study. For example, phonetic dictionary expansion may affect different systems differently (e.g., possibly achieving greater improvements in a segment-based recognizer [33] than in HMM-based recognizers [30], [10]), but to our knowledge there have been no direct comparisons on identical tasks and data sets. We have also only briefly touched on automatic sub-word unit learning and the related task of automatic dictionary learning [39], [40], [47].…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…In particular, the interactions between subword modeling, observation modeling, and the choice of acoustic observations deserve more study. For example, phonetic dictionary expansion may affect different systems differently (e.g., possibly achieving greater improvements in a segment-based recognizer [33] than in HMM-based recognizers [30], [10]), but to our knowledge there have been no direct comparisons on identical tasks and data sets. We have also only briefly touched on automatic sub-word unit learning and the related task of automatic dictionary learning [39], [40], [47].…”
Section: Discussionmentioning
confidence: 99%
“…This led to a great deal of activity on modeling pronunciation variation, including two workshops sponsored by the International Speech Communication Association [27], [28]. The majority (but by no means all) of the proposed approaches during this period kept the phone as the basic sub-word unit, and focused on ways of predicting the possible phonetic sequences for any given word using phonological rules or other means [29], [30], [31], [32], [33], [7], [34], [10].…”
Section: A Dictionary Expansionmentioning
confidence: 99%
See 3 more Smart Citations