Prosody dependent speech recognition on radio news corpus of American English

Chen, Ken; Hasegawa‐Johnson, Mark; Cohen, A. S.; Borys, Sarah; Kim, Sung-Suk; Cole, Jennifer; Choi, Jeung‐Yoon

doi:10.1109/tsa.2005.853208

Cited by 38 publications

(32 citation statements)

References 23 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The word to be predicted is more likely to be ``witch" instead of ``which" if an accent is predicted from the current word-prosody context. In the results reported by (Chen et al, 2006), a prosody dependent language model can significantly improve word recognition accuracy over a prosody independent language model, given the same acoustic model. N-gram models can be conveniently used for prosody dependent language modeling.…”

Section: [ŵ ] = Arg Max P(o | Wp) P(wp) = Arg Max P(o | Qh ) P(qhmentioning

confidence: 97%

“…In (Chen et al 2006), the prosody variable p m takes 8 possible values composed by 2 discrete prosodic variables: a variable a that marks a word as either ``a'' (pitch-accented) or ``u'' (pitch-unaccented), and a variable b that marks a word as ``i,m,f,o'' (phrase-initial, phrasemedial, phrase-final, one-word phrase) according to its position in an intonational phrase. Thus, in this scheme, a prosody-dependent word transcription may contain prosodydependent word tokens of the form w ab .…”

Section: [ŵ ] = Arg Max P(o | Wp) P(wp) = Arg Max P(o | Qh ) P(qhmentioning

confidence: 99%

“…Modeling W and P jointly in this prosody dependent framework creates a new search space in which the candidate word sequences are weighted in terms of their conformability with natural prosody. An information-theoretic analysis in Hasegawa-Johnson et al, 2005;Chen et al, 2006) showed that it is possible for a prosody-dependent speech recognizer to improve word recognition accuracy even if the acoustic model and the language model do not separately lead to improvements. Even if prosody does not improve the recognition of words in isolation, the likelihood of the correct sentence-level transcription may be improved by a language model that correctly predicts prosody from the word string, and an acoustic model that correctly predicts the acoustic observations from the prosody.…”

Section: A Bayesian Network View For Spoken Languagementioning

confidence: 99%

“…Even if prosody does not improve the recognition of words in isolation, the likelihood of the correct sentence-level transcription may be improved by a language model that correctly predicts prosody from the word string, and an acoustic model that correctly predicts the acoustic observations from the prosody. In their experiments on the Radio News Corpus (Chen et al, 2006), as large as 11% word recognition accuracy improvement over a prosody independent speech recognizer was achieved by a prosody dependent recognizer that has comparable total parameter count.…”

Section: A Bayesian Network View For Spoken Languagementioning

confidence: 99%

“…Within each state, a 3 mixture Gaussian model is used to model the probability density of a 32-dimensional acoustic-phonetic feature stream consisting of 15 MFCCs, energy and their deltas. The allophone models in APD contain an additional onedimensional Gaussian acoustic-prosodic observation PDF which is used to model the probability density of a nonlinearly-transformed pitch stream, as described in Chen et al, 2006). API contains monophone models adopted from the standard SPHINX set (Lee, 1990) and is unable to detect any prosody related acoustic effects.…”

Section: Word Recognitionmentioning

confidence: 99%

See 4 more Smart Citations

A Factored Language Model for Prosody Dependent Speech Recognition

Chen

Hasegawa‐Johnson

Cole³

2007

Robust Speech Recognition and Understanding

View full text Add to dashboard Cite

Section: [ŵ ] = Arg Max P(o | Wp) P(wp) = Arg Max P(o | Qh ) P(qhmentioning

confidence: 97%

Section: [ŵ ] = Arg Max P(o | Wp) P(wp) = Arg Max P(o | Qh ) P(qhmentioning

confidence: 99%

Section: A Bayesian Network View For Spoken Languagementioning

confidence: 99%

Section: A Bayesian Network View For Spoken Languagementioning

confidence: 99%

Section: Word Recognitionmentioning

confidence: 99%

See 3 more Smart Citations

A Factored Language Model for Prosody Dependent Speech Recognition

Chen

Hasegawa‐Johnson

Cole³

2007

Robust Speech Recognition and Understanding

View full text Add to dashboard Cite

Influence of Reading Errors on the Text-Based Automatic Evaluation of Pathologic Voices

Haderlein

Nöth

Maier

et al.

Text, Speech and Dialogue

View full text Add to dashboard Cite

Objective vs. Subjective Evaluation of Speakers with and without Complete Dentures

Haderlein

Maier

Nöth

et al. 2009

Text, Speech and Dialogue

View full text Add to dashboard Cite

Abstract. For dento-oral rehabilitation of edentulous (toothless) patients, speech intelligibility is an important criterion. 28 persons read a standardized text once with and once without wearing complete dentures. Six experienced raters evaluated the intelligibility subjectively on a 5-point scale and the voice on the 4-point Roughness-Breathiness-Hoarseness (RBH) scales. Objective evaluation was performed by Support Vector Regression (SVR) on the word accuracy (WA) and word recognition rate (WR) of a speech recognition system, and a set of 95 word-based prosodic features. The word accuracy combined with selected prosodic features showed a correlation of up to r = 0.65 to the subjective ratings for patients with dentures and r = 0.72 for patients without dentures. For the RBH scales, however, the average correlation of the feature subsets to the subjective ratings for both types of recordings was r < 0.4.

show abstract

Prosody dependent speech recognition on radio news corpus of American English

Cited by 38 publications

References 23 publications

A Factored Language Model for Prosody Dependent Speech Recognition

A Factored Language Model for Prosody Dependent Speech Recognition

Influence of Reading Errors on the Text-Based Automatic Evaluation of Pathologic Voices

Objective vs. Subjective Evaluation of Speakers with and without Complete Dentures

Contact Info

Product

Resources

About