2021
DOI: 10.1016/j.csl.2020.101098
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual and unsupervised subword modeling for zero-resource languages

Abstract: Unsupervised subword modeling aims to learn lowlevel representations of speech audio in "zero-resource" settings: that is, without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused on learning from target language data only, and has been evaluat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
23
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(24 citation statements)
references
References 43 publications
(77 reference statements)
1
23
0
Order By: Relevance
“…We perform our experiments on the GlobalPhone corpus of read speech [22]. As in [25], we treat six languages as our target zero-resource languages: Spanish (ES), Hausa (HA), Croatian (HR), Swedish (SV), Turkish (TR) and Mandarin (ZH). Each language has on average 16 hours of training, 2 hours of development and 2 hours of test data.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We perform our experiments on the GlobalPhone corpus of read speech [22]. As in [25], we treat six languages as our target zero-resource languages: Spanish (ES), Hausa (HA), Croatian (HR), Swedish (SV), Turkish (TR) and Mandarin (ZH). Each language has on average 16 hours of training, 2 hours of development and 2 hours of test data.…”
Section: Methodsmentioning
confidence: 99%
“…The area under this curve is used as final evaluation metric, referred to as the average precision (AP). We use the same specific evaluation setup as in [25].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another approach seeks to use pre-trained outof-domain ASR systems to tokenize untranscribed in-domain speech and hence each frame is assigned with an ASR senone label [5], [14]. Fully unsupervised [13] or weakly supervised [15]- [17] methods for DNN training were also reported in the research on acoustic modeling for low-resource languages.…”
Section: Introductionmentioning
confidence: 99%
“…Although our main focus is monolingual pretraining, we also looked briefly at multilingual pretraining, inspired by recent work on multilingual ASR [29,30] and evidence that multilingual pretraining followed by fine-tuning on a distinct target language can improve ASR on the target language [11,31,32]. These experiments did not directly compare pretraining using a similar amount of monolingual data, but such a comparison was done by [33,34] in their work on learning feature representations for a target language with no transcribed data. They found a benefit for multilingual vs monolingual pretraining given the same amount of data.…”
Section: Multilingual Pretrainingmentioning
confidence: 99%