2018
DOI: 10.1121/1.5045323
|View full text |Cite
|
Sign up to set email alerts
|

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles

Abstract: Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
8
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 47 publications
2
8
0
Order By: Relevance
“…As hypothesized, the worst performance for both humans and machines was obtained for style mismatched read speech -conversation trials. Results were consistent with the hypothesis of our previous study [9] of read and pet-directed speech from the same set of speakers. That study showed that humans consistently performed better than machines in both read speech -read speech (EER = 19.02% versus 30.31%) and read speech -pet-directed speech (EER = 39.23% versus 44.17% ) trials.…”
Section: Human and Machine Performancesupporting
confidence: 92%
See 1 more Smart Citation
“…As hypothesized, the worst performance for both humans and machines was obtained for style mismatched read speech -conversation trials. Results were consistent with the hypothesis of our previous study [9] of read and pet-directed speech from the same set of speakers. That study showed that humans consistently performed better than machines in both read speech -read speech (EER = 19.02% versus 30.31%) and read speech -pet-directed speech (EER = 39.23% versus 44.17% ) trials.…”
Section: Human and Machine Performancesupporting
confidence: 92%
“…For example, style variability confuses ear witnesses hearing a criminal shouting vs reading aloud during a voice lineup [8]. Human and machine speaker discrimination performances have been compared when style changed from read to pet-directed speech, which is characterized by exaggerated prosody [9]. In both examples, differences in style were extreme, and little is known about how moderate variations in style, for example between read and conversational speech, affect the relative performance of humans vs machines in speaker discrimination performance.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, the impact of vocal effort ranging from whisper (Vestman et al, 2018) to shout (Hanilci et al, 2013) and scream (similar to shout but lacking phonemic structure) (Hansen et al, 2017) has been addressed in many studies. Other examples include acted speech by naive or professional (Pietrowicz et al, 2017) speakers, pet-directed speech (Park et al, 2018), and the impact of varied speech FIG. 1.…”
Section: Related Workmentioning
confidence: 99%
“…Style factors are shown to be present in widely used speaker representations [13] such as i-vectors [14] and xvectors [4]. ASV performance degradation due to style mismatch between the enrollment and test utterances were systematically analyzed in [15,16,17]. To alleviate the degradation due to style variabilities, some studies proposed the use of a joint factor analysis framework [11,12].…”
Section: Introductionmentioning
confidence: 99%