2011
DOI: 10.1007/s12046-011-0049-x
|View full text |Cite
|
Sign up to set email alerts
|

Studies on inter-speaker variability in speech and its application in automatic speech recognition

Abstract: In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory proc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 57 publications
(55 reference statements)
0
3
0
Order By: Relevance
“…The most efficient and widely used acoustic parameters are Mel Frequency Cepstral Coefficients (MFCC) [2], [3]. The computation of these coefficients involves multiple steps, and the selection of these parameters is based on their ability for interpolation, robustness to noise, and adaptation to inter and intra-speaker variability [4] [5].…”
Section: Introductionmentioning
confidence: 99%
“…The most efficient and widely used acoustic parameters are Mel Frequency Cepstral Coefficients (MFCC) [2], [3]. The computation of these coefficients involves multiple steps, and the selection of these parameters is based on their ability for interpolation, robustness to noise, and adaptation to inter and intra-speaker variability [4] [5].…”
Section: Introductionmentioning
confidence: 99%
“…Emotion is one of the internal sources that most affects the performance of the speaker verification system. Emotion induce intra-speaker vocal variability [18], even more for stress condition [19].…”
Section: Related Workmentioning
confidence: 99%
“…An important factor that characterises speech signals is vocal tract length (VTL), which varies from person to person and is of importance to deal with in speech systems including speaker recognition and automatic speech recognition (ASR). For example, vocal tract length normalization (VTLN) [1], [2], [3] is widely used in ASR to reduce the effect of speaker variability due to the difference in VTL among speakers. In [4], VTL warped features are used for ASR as a data augmentation technique.…”
Section: Introductionmentioning
confidence: 99%