Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1081
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling

Abstract: This study addresses the problem of learning robust frame-level feature representation for unsupervised subword modeling in the zero-resource scenario. Robustness of the learned features is achieved through effective speaker adaptation and exploiting cross-lingual phonetic knowledge. For speaker adaptation, an out-of-domain automatic speech recognition (ASR) system is used to estimate fMLLR features for untranscribed speech of target zero-resource languages. The fMLLR features are applied in multi-task learnin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
2

Relationship

5
0

Authors

Journals

citations
Cited by 6 publications
(39 citation statements)
references
References 32 publications
0
39
0
Order By: Relevance
“…A DNN was trained using these labels to generate BNF or posteriorgram representation. In [5], [14], language-mismatched ASR systems were utilized to decode the target speech, and frame labels were generated from the ASR decoding lattices. In [30], BNF representation was generated by applying multi-task learning with both indomain and out-of-domain data [25].…”
Section: Related Work a Deep Learning Approaches To Unsupervisementioning
confidence: 99%
See 4 more Smart Citations
“…A DNN was trained using these labels to generate BNF or posteriorgram representation. In [5], [14], language-mismatched ASR systems were utilized to decode the target speech, and frame labels were generated from the ASR decoding lattices. In [30], BNF representation was generated by applying multi-task learning with both indomain and out-of-domain data [25].…”
Section: Related Work a Deep Learning Approaches To Unsupervisementioning
confidence: 99%
“…The frame labels for out-of-domain data were obtained by HMM forced alignment, while the labels for in-domain data were from DPGMM clustering [12]. In [5], [14], [31], a DNN AM was trained with transcribed data of an out-of-domain language, and used to extract BNFs or posteriorgrams from target speech.…”
Section: Related Work a Deep Learning Approaches To Unsupervisementioning
confidence: 99%
See 3 more Smart Citations