Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-642
|View full text |Cite
|
Sign up to set email alerts
|

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
42
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 49 publications
(42 citation statements)
references
References 29 publications
0
42
0
Order By: Relevance
“…To evaluate the accuracy of the speech translation, following the practice in [15], we pre-train an automatic speech recognition model (which can achieve 85.62 BLEU points on our test set and is comparable with [15]) to generate the corresponding text of the translated speech, and then calculate the BLEU score [29] between the generated text and the reference text. We report BLEU score using case insensitive BLEU with moses tokenizer 6 and multi-bleu.perl 7 . Due to the Fisher corpus has 4 English references in the test set, we report 4-reference BLEU score for Spanish to English setting, and still report single-reference BLEU score for English to Spanish setting.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…To evaluate the accuracy of the speech translation, following the practice in [15], we pre-train an automatic speech recognition model (which can achieve 85.62 BLEU points on our test set and is comparable with [15]) to generate the corresponding text of the translated speech, and then calculate the BLEU score [29] between the generated text and the reference text. We report BLEU score using case insensitive BLEU with moses tokenizer 6 and multi-bleu.perl 7 . Due to the Fisher corpus has 4 English references in the test set, we report 4-reference BLEU score for Spanish to English setting, and still report single-reference BLEU score for English to Spanish setting.…”
Section: Discussionmentioning
confidence: 99%
“…A variety of previous works [1,6,10,11,16,25,35,44] have investigated the conversion between speech and their corresponding phonetic categories (discrete tokens) in an unsupervised manner, which mimics the way that human infants learn acoustic models in their mother tongue during their early years of life [39]. Among these works, vector quantized variational autoencoder (VQ-VAE) [3,7,9,22,[34][35][36] has been widely adopted and shown advantages over other methods.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Previously investigated approaches can be divided into two categories, namely bottom-up modeling and top-down modeling. In the bottom-up approach, speech is viewed as a sequence of low-level components, e.g., frames or segments, which can be grouped by clustering techniques to define higher-level structures [12]- [14]. The learned clusters are regarded as the basic units to represent the language concerned.…”
Section: A Unsupervised Acoustic Modelingmentioning
confidence: 99%
“…The top performances in discovering speech representation in ZeroSpeech 2015 and 2017 are dominated by a Bayesian non-parametric approach with unsupervised cluster speech features using a Dirichlet process Gaussian mixture model (DPGMM) [4,5]. However, the DPGMM model is too sensitive to acoustic variations and often produces too many subword units and a relatively high-dimensional posteriogram, which implies high computational cost for learning and inference as well as more tendencies for overfitting [6].…”
Section: Introductionmentioning
confidence: 99%