Acoustic correlates of speech intelligibility: the usability of the eGeMAPS feature set for atypical speech

Xue, Wei; Cucchiarini, Catia; Hout, R.W.N.M. van; Strik, Helmer

doi:10.21437/slate.2019-9

Cited by 15 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With respect to the classification of speakers as either dysarthric or non-dysarthric, the results suggest that the intelligibility measures assigned by human raters and the probabilities computed through the objective procedure based on acoustic-phonetic features are partly complementary to each other, as also found by Bunton et al [22]. These results are also in line with previous findings that acoustic-phonetic features have correlations to speaker types or to speech intelligibility [17,22,24], to a certain extent.…”

Section: Discussionsupporting

confidence: 89%

“…Bunton et al [22] found that a restricted intensity range tended to be associated with reduced speech intelligibility in amyotrophic lateral sclerosis speakers with moderate intelligibility. Xue et al [24] investigated the usability of the eGeMAPS feature set, which contains the three mentioned features, for predicting speech intelligibility at phoneme level. Their results indicated that this feature set is potentially usable and revealed important differences between dysarthric speech and non-dysarthric speech.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speech Intelligibility of Dysarthric Speech: Human Scores and Acoustic-Phonetic Features

Xue¹,

Hout²,

Boogmans³

et al. 2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

We investigated speech intelligibility in dysarthric and nondysarthric speakers as measured by two commonly used metrics, ratings through the Visual Analogue Scale (VAS) and word accuracy (AcW) through orthographic transcriptions. To gain a better understanding of how acoustic-phonetic correlates could be employed to obtain more objective measures of speech intelligibility and a better classification of dysarthric and non-dysarthric speakers, we studied the relation between these measures of intelligibility and some important acoustic-phonetic correlates. We found that the two intelligibility measures are related, but distinct, and that they might refer to different components of the intelligibility construct. The acoustic-phonetic features showed no difference in the mean values between the two speaker types at the utterance level, but more than half of them played a role in classifying the two speaker types. We computed an acoustic-phonetic probability index (API) at the speaker level. API is moderately correlated to VAS ratings but not correlated to AcW. In addition, API and VAS complement each other in classifying dysarthric and non-dysarthric speakers. This suggests that the intelligibility measures assigned by human raters and acoustic-phonetic features relate to different constructs of intelligibility.

show abstract

Section: Discussionsupporting

confidence: 89%

Section: Introductionmentioning

confidence: 99%

Speech Intelligibility of Dysarthric Speech: Human Scores and Acoustic-Phonetic Features

Xue¹,

Hout²,

Boogmans³

et al. 2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…The openSMILE toolkit performs the extraction of acoustic parameters that describe the paralinguistic characteristics of the speech signal. Based on the previous success that these acoustic parameters were able to assess the personality [ 28 ], detect a speech-related disease [ 29 ] and identify the gender and age [ 30 ] of a person, we deployed the acoustic parameter sets defined in eGeMAPS and ComParE to facilitate the classification of the physical load based on speech signals [ 31 , 32 ].…”

Section: Discussionmentioning

confidence: 99%

Validation of a Speech Database for Assessing College Students’ Physical Competence under the Concept of Physical Literacy

Lee

et al. 2022

IJERPH

View full text Add to dashboard Cite

This study developed a speech database for assessing one of the elements of physical literacy—physical competence. Thirty-one healthy and native Cantonese speakers were instructed to read a material aloud after various exercises. The speech database contained four types of speech, which were collected at rest and after three exercises of the Canadian Assessment of Physical Literacy 2nd Edition. To show the possibility of detecting each exercise state, a support vector machine (SVM) was trained on the acoustic features. Two speech feature sets, the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) and Computational Paralinguistics Challenge (ComParE), were utilized to perform speech signal processing. The results showed that the two stage four-class SVM were better than the stage one. The performances of both feature sets could achieve 70% accuracy (unweighted average recall (UAR)) in the three-class model after five-fold cross-validation. The UAR result of the resting and vigorous state on the two-class model running with the ComParE feature set was 97%, and the UAR of the resting and moderate state was 74%. This study introduced the process of constructing a speech database and a method that can achieve the short-time automatic classification of physical states. Future work on this corpus, including the prediction of the physical competence of young people, comparison of speech features with other age groups and further spectral analysis, are suggested.

show abstract

“…In particular, prosody features include fundamental frequency (F0), intensity measures, and voicing probabilities, as these have been widely linked to emotions (Banse and Scherer, 1996). Next, the so-called extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) (Eyben et al, 2016), which has been widely used in many recent emotion recognition challenges (e.g., Valstar, 2016;Ringeval et al, 2019;Xue et al, 2019), is also explored and contains a set of 88 acoustic parameters relating to pitch, loudness, unvoiced segments, temporal dynamics, and cepstral features. Lastly, modulation spectral features are explored as they capture second-order periodicities in the speech signal and have been shown to convey emotional information (Wu et al, 2011;Avila et al, 2021).…”

Section: Automatic Speech Recognitionmentioning

confidence: 99%

Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions

Kshirsagar

Pendyala

Falk

2023

Front. Comput. Sci.

View full text Add to dashboard Cite

Automatic emotion recognition (AER) systems are burgeoning and systems based on either audio, video, text, or physiological signals have emerged. Multimodal systems, in turn, have shown to improve overall AER accuracy and to also provide some robustness against artifacts and missing data. Collecting multiple signal modalities, however, can be very intrusive, time consuming, and expensive. Recent advances in deep learning based speech-to-text and natural language processing systems, however, have enabled the development of reliable multimodal systems based on speech and text while only requiring the collection of audio data. Audio data, however, is extremely sensitive to environmental disturbances, such as additive noise, thus faces some challenges when deployed “in the wild.” To overcome this issue, speech enhancement algorithms have been deployed at the input signal level to improve testing accuracy in noisy conditions. Speech enhancement algorithms can come in different flavors and can be optimized for different tasks (e.g., for human perception vs. machine performance). Data augmentation, in turn, has also been deployed at the model level during training time to improve accuracy in noisy testing conditions. In this paper, we explore the combination of task-specific speech enhancement and data augmentation as a strategy to improve overall multimodal emotion recognition in noisy conditions. We show that AER accuracy under noisy conditions can be improved to levels close to those seen in clean conditions. When compared against a system without speech enhancement or data augmentation, an increase in AER accuracy of 40% was seen in a cross-corpus test, thus showing promising results for “in the wild” AER.

show abstract

Acoustic correlates of speech intelligibility: the usability of the eGeMAPS feature set for atypical speech

Cited by 15 publications

References 25 publications

Speech Intelligibility of Dysarthric Speech: Human Scores and Acoustic-Phonetic Features

Speech Intelligibility of Dysarthric Speech: Human Scores and Acoustic-Phonetic Features

Validation of a Speech Database for Assessing College Students’ Physical Competence under the Concept of Physical Literacy

Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions

Contact Info

Product

Resources

About