Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer

Kamper, Herman; Matusevych, Yevgen; Goldwater, Sharon

doi:10.1109/taslp.2021.3060805

Cited by 12 publications

(11 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the zero-resource setting we don't have labelled data in the target language to construct the positive and negative word pairs required for training. We therefore follow the approach of [29], and train a multilingual model on ground truth word pairs (extracted from forced alignments) from a number of languages for which we have labelled data. Subsequently, at test time, we apply the encoder RNN from the multilingual model to extract AWEs for speech from the target zero-resource language.…”

Section: Acoustic Word Embedding Modelmentioning

confidence: 99%

“…However, there still exists a large performance gap between these unsupervised models and their supervised counterparts [11,26]. A recent alternative for obtaining AWEs on a zero-resource language is to use multilingual transfer learning [27][28][29][30][31]. The goal is to have the benefits of supervised learning by training a model on labelled data from multiple well-resourced languages, but to then apply the model to an unseen target zero-resource language without fine-tuning ita form of transductive transfer learning [32].…”

Section: Introductionmentioning

confidence: 99%

“…The goal is to have the benefits of supervised learning by training a model on labelled data from multiple well-resourced languages, but to then apply the model to an unseen target zero-resource language without fine-tuning ita form of transductive transfer learning [32]. This multilingual transfer approach has been shown to outperform unsupervised monolingual AWE models [29,33].…”

Section: Introductionmentioning

confidence: 99%

“…Although there is clear benefit in applying multilingual AWE models to an unseen zero-resource language, it is still unclear how the particular choice of training languages affects subsequent performance. Preliminary experiments [29] show improved scores when training a monolingual model on one language and applying it to another from the same family. But this has not been investigated systematically and there are still several unanswered questions: Does the benefit of training on related languages diminish as we train on more languages (which might or might not come from the same family as the target zero-resource language)?…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language

Jacobs¹,

Kamper²

2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

Acoustic word embedding models map variable duration speech segments to fixed dimensional vectors, enabling efficient speech search and discovery. Previous work explored how embeddings can be obtained in zero-resource settings where no labelled data is available in the target language. The current best approach uses transfer learning: a single supervised multilingual model is trained using labelled data from multiple well-resourced languages and then applied to a target zero-resource language (without fine-tuning). However, it is still unclear how the specific choice of training languages affect downstream performance. Concretely, here we ask whether it is beneficial to use training languages related to the target. Using data from eleven languages spoken in Southern Africa, we experiment with adding data from different language families while controlling for the amount of data per language. In word discrimination and query-by-example search evaluations, we show that training on languages from the same family gives large improvements. Through finer-grained analysis, we show that training on even just a single related language gives the largest gain. We also find that adding data from unrelated languages generally doesn't hurt performance.

show abstract

Section: Acoustic Word Embedding Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language

Jacobs¹,

Kamper²

2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…The comparison between a pair of segments is carried out based on speech embedding, which is a fixed-dimension representation that encodes acoustic information of speech segments into a low-dimension space. It allows flexible modeling and processing of speech segments of variable length for different downstream tasks, e.g., spoken term detection and discovery [14], pathological speech classification [15], prediction of speech intelligibility score [16], etc. A similarity or distance score can be calculated on each pair of embeddings.…”

Section: Ssd Detection Systemmentioning

confidence: 99%

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Ng²,

et al. 2021

Preprint

View full text Add to dashboard Cite

Speech sound disorder (SSD) refers to a type of developmental disorder in young children who encounter persistent difficulties in producing certain speech sounds at the expected age. Consonant errors are the major indicator of SSD in clinical assessment. Previous studies on automatic assessment of SSD revealed that detection of speech errors concerning short and transitory consonants is less satisfactory. This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment. The underlying assumption is that the vowel part of a CV segment carries important information of co-articulation from the consonant. Speech embeddings are extracted from CV segments by a recurrent neural network model. The similarity scores between the embeddings of the test segment and the reference segments are computed to determine if the test segment is the expected consonant or not. Experimental results show that using CV segments achieves improved performance on detecting speech errors concerning those "difficult" consonants reported in the previous studies.

show abstract

Toxicity prediction and classification of Gunqile-7 with small sample based on transfer learning method

Zhao,

Qiu,

Bai

et al. 2024

Computers in Biology and Medicine

View full text Add to dashboard Cite

Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer

Cited by 12 publications

References 61 publications

Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language

Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Toxicity prediction and classification of Gunqile-7 with small sample based on transfer learning method

Contact Info

Product

Resources

About