ASR Systems as Models of Phonetic Category Perception in Adults

Schatz, Thomas; Bach, Francis; Dupoux, Emmanuel

doi:10.31234/osf.io/57d8x

Cited by 2 publications

(4 citation statements)

References 9 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A few studies have investigated patterns of L2 discrimination in acoustic models, looking at overall accuracy on phonemic contrasts from languages other than the training language. But their conclusions have been based on qualitative summaries of the behaviour of the models, with no human reference data on the same stimuli (Schatz et al, 2017;Schatz & Feldman, 2018).…”

Section: Introductionmentioning

confidence: 99%

Comparing unsupervised speech learning directly to human performance in speech perception

Millet¹,

Jurov²,

Dunbar³

2019

Preprint

View full text Add to dashboard Cite

We compare the performance of humans (English and French listeners) versus an unsupervised speech model in a perception experiment (ABX discrimination task). Although the ABX task has been used for acoustic model evaluation in previous research, the results have not, until now, been compared directly with human behaviour in an experiment. We show that a standard, well-performing model (DPGMM) has better accuracy at predicting human responses than the acoustic baseline. The model also shows a native language effect, better resembling native listeners of the language on which it was trained. However, the native language effect shown by the models is different than the one shown by the human listeners, and, notably, the models do not show the same overall patterns of vowel confusions.

show abstract

Section: Introductionmentioning

confidence: 99%

Comparing unsupervised speech learning directly to human performance in speech perception

Millet¹,

Jurov²,

Dunbar³

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…The catch trials consisted of additional, highly distinct ABX stimuli, including several which required participants to distinguish cat from dog for English speakers or caillou from hibou for French speakers. 7 No participant is tested twice on the same phone pair, and the combination of speakers is not predictive of the right answer. 8 If the same sound belongs to different inventories, it is treated as distinct, for a total of 1032 possible phonemes.…”

Section: Methodsmentioning

confidence: 99%

“…For testing, triplets were counterbalanced into lists of 190 per participant. 7 Each triplet was tested three times, so that most contrasts are tested at least 36 times. Participants respond as to which of the two reference stimuli the probe corresponded to on a six-point scale, ranging from first for sure to second for sure, with two intermediate degrees of certainty for each.…”

Section: Perceptimatic Data Set Constructionmentioning

confidence: 99%

“…The finer-grained study in [3] compared Japanese speakers' perceptual boundaries for the (allophonic) [s]/[C] contrast with the behaviour of GMM phoneme classifiers, while [4] compared phonetic adaptation by Dutch listeners to artificially "accented" productions of the [r]/[l] to acoustic models adapted to the experimental stimuli. Several studies also investigate whether ASR and unsupervised speech representation learning accords with established phenomena in human speech perception at a qualitative level [5,6,7,8].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Perceptimatic: A Human Speech Perception Benchmark for Unsupervised Subword Modelling

Millet¹,

Dunbar²

2020

Interspeech 2020

View full text Add to dashboard Cite

In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well as the results of 91 English-and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from corpora of natural running read speech, used for the 2017 Zero Resource Speech Challenge. We provide a method to compare humans' perceptual space with models' representational space, and we apply it to models previously submitted to the Challenge. We show that, unlike unsupervised models and supervised multilingual models, a standard supervised monolingual HMM-GMM phone recognition system, while good at discriminating phones, yields a representational space very different from that of human native listeners.

show abstract

ASR Systems as Models of Phonetic Category Perception in Adults

Cited by 2 publications

References 9 publications

Comparing unsupervised speech learning directly to human performance in speech perception

Comparing unsupervised speech learning directly to human performance in speech perception

Perceptimatic: A Human Speech Perception Benchmark for Unsupervised Subword Modelling

Contact Info

Product

Resources

About