2017
DOI: 10.31234/osf.io/57d8x
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ASR Systems as Models of Phonetic Category Perception in Adults

Abstract: We test the potential of standard Automatic Speech Recognition (ASR) systems trained on large corpora of continuous speech as quantitative models of human speech processing. In human adults, speech perception is attuned to efficiently process native speech sounds, at the expense of difficulties in processing non-native sounds. We use ABX-discriminability measures to test whether ASR models can account for the patterns of confusion between speech sounds observed in humans. We show that ASR models reproduce some… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 9 publications
(17 reference statements)
0
4
0
Order By: Relevance
“…A few studies have investigated patterns of L2 discrimination in acoustic models, looking at overall accuracy on phonemic contrasts from languages other than the training language. But their conclusions have been based on qualitative summaries of the behaviour of the models, with no human reference data on the same stimuli (Schatz et al, 2017;Schatz & Feldman, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…A few studies have investigated patterns of L2 discrimination in acoustic models, looking at overall accuracy on phonemic contrasts from languages other than the training language. But their conclusions have been based on qualitative summaries of the behaviour of the models, with no human reference data on the same stimuli (Schatz et al, 2017;Schatz & Feldman, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…The catch trials consisted of additional, highly distinct ABX stimuli, including several which required participants to distinguish cat from dog for English speakers or caillou from hibou for French speakers. 7 No participant is tested twice on the same phone pair, and the combination of speakers is not predictive of the right answer. 8 If the same sound belongs to different inventories, it is treated as distinct, for a total of 1032 possible phonemes.…”
Section: Methodsmentioning
confidence: 99%
“…For testing, triplets were counterbalanced into lists of 190 per participant. 7 Each triplet was tested three times, so that most contrasts are tested at least 36 times. Participants respond as to which of the two reference stimuli the probe corresponded to on a six-point scale, ranging from first for sure to second for sure, with two intermediate degrees of certainty for each.…”
Section: Perceptimatic Data Set Constructionmentioning
confidence: 99%
See 1 more Smart Citation