A phonetic model of non-native spoken word processing

Matusevych, Yevgen; Kamper, Herman; Schatz, Thomas; Feldman, Naomi H.; Goldwater, Sharon

doi:10.18653/v1/2021.eacl-main.127

Cited by 5 publications

(6 citation statements)

References 46 publications

(68 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the vast majority of previous work has been driven by the engineering applications of AWEs, there is a growing scientific interest in using deep neural networks as cognitive models of (human) speech processing [22,23,38,39]. Therefore, we argue that this cognitively motivated direction requires us to take a closer look at the embedding space and examine the degree to which we can rely on the emergent distance as an estimate of (perceptual) dissimilarity between linguistic units.…”

Section: Discussionmentioning

confidence: 99%

“…Since AWE models have been recently adopted as cognitive models of infant phonetic learning [22] and cross-language non-native processing [23], we argue that more effort should be devoted to analyze and understand the emergent embedding space to make sure it behaves as expected. In this paper, we take a step in this direction and make the following contributions:…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Abdullah¹,

Mosbach²,

Zaitova³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

Several variants of deep neural networks have been successfully employed for building parametric models that project variableduration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity? To answer this question, we empirically investigate the performance of supervised approaches for AWEs with different neural architectures and learning objectives. We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity. Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity. Our findings highlight the necessity to rethink the current intrinsic evaluations for AWEs.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Abdullah¹,

Mosbach²,

Zaitova³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

show abstract

“…In this section, we train and test four neural network models on the same three datasets as before. These models have been proposed in speech technology research, in particular in low‐resource settings where transcribed data may not be available and showed high performance in word and phone discrimination tasks (Kamper et al., 2015; Kamper, 2019; Matusevych et al., 2021; Renshaw, Kamper, Jansen, & Goldwater, 2015). Fig.…”

Section: Study 2: Testing Other Modelsmentioning

confidence: 99%

Infant Phonetic Learning as Perceptual Space Learning: A Crosslinguistic Evaluation of Computational Models

et al. 2023

Self Cite

View full text Add to dashboard Cite

In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. This process of early phonetic learning has traditionally been framed as phonetic category acquisition. However, recent studies have hypothesized that the attunement may instead reflect a perceptual space learning process that does not involve categories. In this article, we explore the idea of perceptual space learning by implementing five different perceptual space learning models and testing them on three phonetic contrasts that have been tested in the infant speech perception literature. We reproduce and extend previous results showing that a perceptual space learning model that uses only distributional information about the acoustics of short time slices of speech can account for at least some crosslinguistic differences in infant perception. Moreover, we find that a second perceptual space learning model, which benefits from word‐level guidance. performs equally well in capturing crosslinguistic differences in infant speech perception. These results provide support for the general idea of perceptual space learning as a theory of early phonetic learning but suggest that more fine‐grained data are needed to distinguish between different formal accounts. Finally, we provide testable empirical predictions of the two most promising models and show that these are not identical, making it possible to independently evaluate each model in experiments with infants in future research.

show abstract

“…In this section, we train and test four neural network models on the same three data sets as before. These models have been proposed in speech technology research, in particular in lowresource setting where transcribed data may not be available, and showed high performance in word and phone discrimination tasks (Kamper, 2019;Kamper et al, 2015;Matusevych, Kamper, Schatz, Feldman, & Goldwater, 2021;Renshaw, Kamper, Jansen, & Goldwater, 2015). Figure 2 schematically shows the difference between the models' architectures and input data.…”

Section: Study 2: Testing Other Modelsmentioning

confidence: 99%

Infant phonetic learning as perceptual space learning: A crosslinguistic evaluation of computational models

Matusevych¹,

Schatz²,

Kamper³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. This process of early phonetic learning has traditionally been framed as phonetic category acquisition. However, recent studies have hypothesized that the attunement may instead reflect a perceptual space learning process that does not involve categories. In this article, we explore the idea of perceptual space learning by implementing five different perceptual space learning models and testing them on three phonetic contrasts that have been tested in the infant speech perception literature. We replicate and extend previous results showing that a perceptual space learning model that uses only distributional information about the acoustics of short time slices of speech can account for at least some cross-linguistic differences in infant perception. Moreover, we find that a second perceptual space learning model which benefits from word-level guidance performs equally well in capturing cross-linguistic differences in infant speech perception. These results provide support for the general idea of perceptual space learning as a theory of early phonetic learning, but suggest that more fine-grained data is needed to distinguish between different formal accounts. Finally, we provide testable empirical predictions of the two most promising models and show that these are not identical, making it possible to independently evaluate each model in experiments with infants in future research.

show abstract

A phonetic model of non-native spoken word processing

Cited by 5 publications

References 46 publications

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Infant Phonetic Learning as Perceptual Space Learning: A Crosslinguistic Evaluation of Computational Models

Infant phonetic learning as perceptual space learning: A crosslinguistic evaluation of computational models

Contact Info

Product

Resources

About