2010 IEEE International Workshop on Machine Learning for Signal Processing 2010
DOI: 10.1109/mlsp.2010.5589174
|View full text |Cite
|
Sign up to set email alerts
|

Automatic vocal effort detection for reliable speech recognition

Abstract: This paper describes an approach for enhancing the robustness of isolated words recognizer by extending its flexibility in the domain of speaker's variable vocal effort level. An analysis of spectral properties of spoken vowels in four various speaking modes (whispering, soft, normal, and loud) confirm consistent spectral tilt changes. Severe impact of vocal effort variability on the accuracy of a speakerdependent word recognizer is presented and an efficient remedial measure using multiple-model framework pai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…In future work, we will test the algorithms under acoustic conditions that may affect their reliability, such as presence of non-stationary noise [24] and vocal effort fluctuations [25]. Our goal is to find an optimal of effective algorithms for the correct classification of voiced and unvoiced sounds in continuous Czech speech.…”
Section: Discussionmentioning
confidence: 99%
“…In future work, we will test the algorithms under acoustic conditions that may affect their reliability, such as presence of non-stationary noise [24] and vocal effort fluctuations [25]. Our goal is to find an optimal of effective algorithms for the correct classification of voiced and unvoiced sounds in continuous Czech speech.…”
Section: Discussionmentioning
confidence: 99%
“…In future work, the long-term speech spectrum and extracted spectral parameters will be tested for their robustness to various factors affecting speech, such as emotions [7], physical fatigue [12], vocal effort [9,28], and others [4]. Furthermore, it will be useful to also investigate the influence of adverse acoustic conditions, especially non-stationary noises [29] due to their indirect influence on speech production.…”
Section: Discussionmentioning
confidence: 99%
“…Frequency-domain effects caused by changing the vocal effort have implications for data-driven speech technology applications relying on short-time spectral features such as mel frequency cepstral coefficients (MFCCs). In particular, the performance of, e.g., automatic speech recognition (ASR) and speaker recognition systems will be affected by vocal effort mismatch between the training and recognition phase [5] [6] [7]. In order to avoid the performance degradation caused by this mismatch, a detection system is needed to aid the recognizer in choosing acoustic models that are most appropriate for the changed conditions [7].…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the performance of, e.g., automatic speech recognition (ASR) and speaker recognition systems will be affected by vocal effort mismatch between the training and recognition phase [5] [6] [7]. In order to avoid the performance degradation caused by this mismatch, a detection system is needed to aid the recognizer in choosing acoustic models that are most appropriate for the changed conditions [7].…”
Section: Introductionmentioning
confidence: 99%