2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8269009
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual bottle-neck feature learning from untranscribed speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
41
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(41 citation statements)
references
References 21 publications
0
41
0
Order By: Relevance
“…Supervised DNN training requires frame-level labels for all training speech, which could be obtained either via a clustering process or exploiting out-of-domain resources. In [11], [12], DPGMM clustering was performed on conventional short-time spectral features of target speech, followed by multilingual DNN training to obtain the BNF representation. In [13], GMM-universal background model (GMM-UBM) was used to generate frame labels.…”
Section: Related Work a Deep Learning Approaches To Unsupervisementioning
confidence: 99%
See 3 more Smart Citations
“…Supervised DNN training requires frame-level labels for all training speech, which could be obtained either via a clustering process or exploiting out-of-domain resources. In [11], [12], DPGMM clustering was performed on conventional short-time spectral features of target speech, followed by multilingual DNN training to obtain the BNF representation. In [13], GMM-universal background model (GMM-UBM) was used to generate frame labels.…”
Section: Related Work a Deep Learning Approaches To Unsupervisementioning
confidence: 99%
“…In [30], BNF representation was generated by applying multi-task learning with both indomain and out-of-domain data [25]. The frame labels for out-of-domain data were obtained by HMM forced alignment, while the labels for in-domain data were from DPGMM clustering [12]. In [5], [14], [31], a DNN AM was trained with transcribed data of an out-of-domain language, and used to extract BNFs or posteriorgrams from target speech.…”
Section: Related Work a Deep Learning Approaches To Unsupervisementioning
confidence: 99%
See 2 more Smart Citations
“…Works in [7,16] estimated fMLLR using a pre-trained out-of-domain ASR. Chen et al [8] applied vocal tract length normalization (VTLN). Another direction is to employ DNNs.…”
Section: Introductionmentioning
confidence: 99%