Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Kamper, Herman; Matusevych, Yevgen; Goldwater, Sharon

doi:10.48550/arxiv.2006.02295

Cited by 3 publications

(10 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead of relying on discovered words from the target zeroresource language, we can exploit labelled data from wellresourced languages to train a single multilingual supervised AWE model [31,32]. This model can then be applied to an unseen zero-resource language.…”

Section: Supervised Multilingual Modelsmentioning

confidence: 99%

“…Experiments in [31] showed that multilingual versions of the CAE-RNN and SIAMESERNN outperform unsupervised monolingual variants. A multilingual CONTRASTIVERNN hasn't been considered in a previous study, as far as we know.…”

Section: Supervised Multilingual Modelsmentioning

confidence: 99%

“…The idea is to train a supervised multilingual AWE model jointly on a number of well-resourced languages for which labelled data is available, but to then apply the model to an unseen zero-resource language. This multilingual transfer approach was found to outperform monolingual unsupervised learning approaches in [31,32].…”

Section: Introductionmentioning

confidence: 96%

“…A recent alternative for obtaining embeddings on a zeroresource language is to use multilingual transfer learning [29][30][31][32]. The idea is to train a supervised multilingual AWE model jointly on a number of well-resourced languages for which labelled data is available, but to then apply the model to an unseen zero-resource language.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

Jacobs

Matusevych

Kamper

2021

2021 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

View full text Add to dashboard Cite

Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length speech segments. For zeroresource languages where labelled data is not available, one AWE approach is to use unsupervised autoencoder-based recurrent models. Another recent approach is to use multilingual transfer: a supervised AWE model is trained on several well-resourced languages and then applied to an unseen zero-resource language. We consider how a recent contrastive learning loss can be used in both the purely unsupervised and multilingual transfer settings. Firstly, we show that terms from an unsupervised term discovery system can be used for contrastive self-supervision, resulting in improvements over previous unsupervised monolingual AWE models. Secondly, we consider how multilingual AWE models can be adapted to a specific zero-resource language using discovered terms. We find that self-supervised contrastive adaptation outperforms adapted multilingual correspondence autoencoder and Siamese AWE models, giving the best overall results in a word discrimination task on six zero-resource languages.

show abstract

Section: Supervised Multilingual Modelsmentioning

confidence: 99%

Section: Supervised Multilingual Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 96%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

Jacobs

Matusevych

Kamper

2021

2021 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

View full text Add to dashboard Cite

show abstract

“…To our knowledge, this is the first time an unsupervised acoustic word embedding approach outperforms DTW on the Buckeye corpus. The SiameseRNN is not competitive in this setting, potentially because it is more reliant on high-quality training pairs than the other models (which is supported by prior work [Kamper et al, 2020a]).…”

Section: Test Set Results For Unsupervised Acoustic Word Embeddingsmentioning

confidence: 88%

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

Peng,

Kamper,

Livescu

2020

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation. The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low-and zero-resource languages. Our model, which we refer to as a maximalsampling correspondence variational autoencoder (MCVAE), is a recurrent neural network (RNN) trained with a novel self-supervised correspondence loss that encourages consistency between embeddings of different instances of the same word.Our training scheme improves on previous correspondence training approaches through the use and comparison of multiple samples from the approximate posterior distribution. In the zero-resource setting, the MCVAE can be trained in an unsupervised way, without any ground-truth word pairs, by using the word-like segments discovered via an unsupervised term discovery system. In both this setting and a semi-supervised low-resource setting (with a limited set of ground-truth word pairs), the MCVAE outperforms previous state-of-the-art models, such as Siamese-, CAE-and VAE-based RNNs.

show abstract

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

Christiaan¹,

Matusevych²,

Kamper³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Cited by 3 publications

References 45 publications

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

Contact Info

Product

Resources

About