Yevgen Matusevych scite author profile

Yevgen Matusevych

5Publications

94Citation Statements Received

301Citation Statements Given

How they've been cited

How they cite others

249

291

Affiliations

Language Science (South Korea), University of Edinburgh, Tilburg University

Publications

Order By: Most citations

Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection

Corkery¹,

Matusevych²,

Goldwater³

2019

View full text Add to dashboard Cite

The cognitive mechanisms needed to account for the English past tense have long been a subject of debate in linguistics and cognitive science. Neural network models were proposed early on, but were shown to have clear flaws. Recently, however, Kirov and Cotterell (2018) showed that modern encoder-decoder (ED) models overcome many of these flaws. They also presented evidence that ED models demonstrate humanlike performance in a nonce-word task. Here, we look more closely at the behaviour of their model in this task. We find that (1) the model exhibits instability across multiple simulations in terms of its correlation with human data, and (2) even when results are aggregated across simulations (treating each simulation as an individual human participant), the fit to the human data is not strong-worse than an older rule-based model. These findings hold up through several alternative training regimes and evaluation measures. Although other neural architectures might do better, we conclude that there is still insufficient evidence to claim that neural nets are a good cognitive model for this task.

show abstract

The impact of first and second language exposure on learning second language constructions

2015

View full text Add to dashboard Cite

We study how the learning of argument structure constructions in a second language (L2) is affected by two basic input properties often discussed in literature – the amount of input and the time of L2 onset. To isolate the impact of the two factors on learning, we use a computational model that simulates bilingual construction learning. In the first two experiments we manipulate the sheer amount of L2 exposure, both in absolute and in relative terms (that is, in relation to the amount of L1 exposure). The results show that higher cumulative amount of L2 exposure leads to higher performance. In the third experiment we manipulate the prior amount of L1 input before the L2 onset (that is, the time of L2 onset). Given equal exposure, we find no negative effect of the later onset on learners’ performance. This has implications for theories of order of acquisition and bilingual construction learning.

show abstract

Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages

Kamper

Matusevych

Goldwater

2020

View full text Add to dashboard Cite

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zeroresource languages. For this transfer learning approach, we consider two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs. We test these using a word discrimination task on six target zero-resource languages. When trained on seven well-resourced languages, both models perform similarly and outperform unsupervised models trained on the zero-resource languages. With just a single training language, the second model works better, but performance depends more on the particular training-testing language pair.Index Terms-Acoustic word embeddings, multilingual models, zero-resource speech processing, query-by-example.

show abstract

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

Jacobs

Matusevych

Kamper

2021

View full text Add to dashboard Cite

Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length speech segments. For zeroresource languages where labelled data is not available, one AWE approach is to use unsupervised autoencoder-based recurrent models. Another recent approach is to use multilingual transfer: a supervised AWE model is trained on several well-resourced languages and then applied to an unseen zero-resource language. We consider how a recent contrastive learning loss can be used in both the purely unsupervised and multilingual transfer settings. Firstly, we show that terms from an unsupervised term discovery system can be used for contrastive self-supervision, resulting in improvements over previous unsupervised monolingual AWE models. Secondly, we consider how multilingual AWE models can be adapted to a specific zero-resource language using discovered terms. We find that self-supervised contrastive adaptation outperforms adapted multilingual correspondence autoencoder and Siamese AWE models, giving the best overall results in a word discrimination task on six zero-resource languages.

show abstract

Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer

Kamper

Matusevych

Goldwater

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. In a word discrimination task on six target languages, all of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision. When using only a few training languages, the multilingual CAE performs better, but with more training languages the other multilingual models perform similarly. Using more training languages is generally beneficial, but improvements are marginal on some languages. We present probing experiments which show that the CAE encodes more phonetic, word duration, language identity and speaker information than the other multilingual models.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.