2021
DOI: 10.1109/taslp.2021.3060805
|View full text |Cite
|
Sign up to set email alerts
|

Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer

Abstract: Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 61 publications
1
10
0
Order By: Relevance
“…In the zero-resource setting we don't have labelled data in the target language to construct the positive and negative word pairs required for training. We therefore follow the approach of [29], and train a multilingual model on ground truth word pairs (extracted from forced alignments) from a number of languages for which we have labelled data. Subsequently, at test time, we apply the encoder RNN from the multilingual model to extract AWEs for speech from the target zero-resource language.…”
Section: Acoustic Word Embedding Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…In the zero-resource setting we don't have labelled data in the target language to construct the positive and negative word pairs required for training. We therefore follow the approach of [29], and train a multilingual model on ground truth word pairs (extracted from forced alignments) from a number of languages for which we have labelled data. Subsequently, at test time, we apply the encoder RNN from the multilingual model to extract AWEs for speech from the target zero-resource language.…”
Section: Acoustic Word Embedding Modelmentioning
confidence: 99%
“…However, there still exists a large performance gap between these unsupervised models and their supervised counterparts [11,26]. A recent alternative for obtaining AWEs on a zero-resource language is to use multilingual transfer learning [27][28][29][30][31]. The goal is to have the benefits of supervised learning by training a model on labelled data from multiple well-resourced languages, but to then apply the model to an unseen target zero-resource language without fine-tuning ita form of transductive transfer learning [32].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The comparison between a pair of segments is carried out based on speech embedding, which is a fixed-dimension representation that encodes acoustic information of speech segments into a low-dimension space. It allows flexible modeling and processing of speech segments of variable length for different downstream tasks, e.g., spoken term detection and discovery [14], pathological speech classification [15], prediction of speech intelligibility score [16], etc. A similarity or distance score can be calculated on each pair of embeddings.…”
Section: Ssd Detection Systemmentioning
confidence: 99%