Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems

Vích, Robert; Nouza, Jan; Vondra, Martin

doi:10.1007/978-3-540-70872-8_10

Cited by 13 publications

(7 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Automatic speech recognition (ASR) systems are a promising but relatively unexplored solution [23, 24]. One significant limitation of ASR for this application is, however, that most approaches require a prohibitively large number of speech samples, since the approach is based on counting the percentage of correctly recognized words.…”

Section: Introductionmentioning

confidence: 99%

Predicting Intelligible Speaking Rate in Individuals with Amyotrophic Lateral Sclerosis from a Small Number of Speech Acoustic and Articulatory Samples

Wang

Kothalkar

Kim

et al. 2016

7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016)

View full text Add to dashboard Cite

Amyotrophic lateral sclerosis (ALS) is a rapidly progressive neurological disease that affects the speech motor functions, resulting in dysarthria, a motor speech disorder. Speech and articulation deterioration is an indicator of the disease progression of ALS; timely monitoring of the disease progression is critical for clinical management of these patients. This paper investigated machine prediction of intelligible speaking rate of nine individuals with ALS based on a small number of speech acoustic and articulatory samples. Two feature selection techniques - decision tree and gradient boosting - were used with support vector regression for predicting the intelligible speaking rate. Experimental results demonstrated the feasibility of predicting intelligible speaking rate from only a small number of speech samples. Furthermore, adding articulatory features to acoustic features improved prediction performance, when decision tree was used as the feature selection technique.

show abstract

Section: Introductionmentioning

confidence: 99%

Predicting Intelligible Speaking Rate in Individuals with Amyotrophic Lateral Sclerosis from a Small Number of Speech Acoustic and Articulatory Samples

Wang

Kothalkar

Kim

et al. 2016

7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016)

View full text Add to dashboard Cite

show abstract

“…The collected speech database consists of 300 records with mean duration of 5 seconds uttered in a neutral speaking style. Every record consists of five concatenated words with a similar phonetic sound in Czech but often totally different meaning (eg "pes", "nes", "ves" -in English: "dog", "carry", "village") usually used in the rhythm test for evaluation by the automatic speech recognition systems (ASR) [14]. These speech records were uttered by a female speaker with F0 ≈ 200 Hz, recorded at 32 kHz, and subsequently resampled to f s = 16 kHz.…”

Section: Experiments and Resultsmentioning

confidence: 99%

Evaluation of Spectral and Prosodic Features of Speech Affected by Orthodontic Appliances Using the GMM Classifier

Přibil¹,

Přibilová²,

Ďuračkoá³

2014

Journal of Electrical Engineering

View full text Add to dashboard Cite

The paper describes our experiment with using the Gaussian mixture models (GMM) for classification of speech uttered by a person wearing orthodontic appliances. For the GMM classification, the input feature vectors comprise the basic and the complementary spectral properties as well as the supra-segmental parameters. Dependence of classification correctness on the number of the parameters in the input feature vector and on the computation complexity is also evaluated. In addition, an influence of the initial setting of the parameters for GMM training process was analyzed. Obtained recognition results are compared visually in the form of graphs as well as numerically in the form of tables and confusion matrices for tested sentences uttered using three configurations of orthodontic appliances.K e y w o r d s: spectral and prosodic features of speech, effect of orthodontic appliances, GMM classifier

show abstract

“…Results of the cepstral coefficient ranges and values statistical analysis are shown also in the form of histograms in a similar way as the spectral flatness ranges and values. This method can also be used for evaluation of emotional synthetic speech as a supplementary approach parallel to the listening tests [23].…”

Section: Resultsmentioning

confidence: 99%

Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

Přibil¹,

Přibilová²

2009

Measurement Science Review

View full text Add to dashboard Cite

The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

show abstract

Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems

Cited by 13 publications

References 5 publications

Predicting Intelligible Speaking Rate in Individuals with Amyotrophic Lateral Sclerosis from a Small Number of Speech Acoustic and Articulatory Samples

Predicting Intelligible Speaking Rate in Individuals with Amyotrophic Lateral Sclerosis from a Small Number of Speech Acoustic and Articulatory Samples

Evaluation of Spectral and Prosodic Features of Speech Affected by Orthodontic Appliances Using the GMM Classifier

Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

Contact Info

Product

Resources

About