2020
DOI: 10.48550/arxiv.2006.02295
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Abstract: Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
9
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(10 citation statements)
references
References 45 publications
1
9
0
Order By: Relevance
“…Instead of relying on discovered words from the target zeroresource language, we can exploit labelled data from wellresourced languages to train a single multilingual supervised AWE model [31,32]. This model can then be applied to an unseen zero-resource language.…”
Section: Supervised Multilingual Modelsmentioning
confidence: 99%
See 3 more Smart Citations
“…Instead of relying on discovered words from the target zeroresource language, we can exploit labelled data from wellresourced languages to train a single multilingual supervised AWE model [31,32]. This model can then be applied to an unseen zero-resource language.…”
Section: Supervised Multilingual Modelsmentioning
confidence: 99%
“…Experiments in [31] showed that multilingual versions of the CAE-RNN and SIAMESERNN outperform unsupervised monolingual variants. A multilingual CONTRASTIVERNN hasn't been considered in a previous study, as far as we know.…”
Section: Supervised Multilingual Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…To our knowledge, this is the first time an unsupervised acoustic word embedding approach outperforms DTW on the Buckeye corpus. The SiameseRNN is not competitive in this setting, potentially because it is more reliant on high-quality training pairs than the other models (which is supported by prior work [Kamper et al, 2020a]).…”
Section: Test Set Results For Unsupervised Acoustic Word Embeddingsmentioning
confidence: 88%