2012 8th International Symposium on Chinese Spoken Language Processing 2012
DOI: 10.1109/iscslp.2012.6423531
|View full text |Cite
|
Sign up to set email alerts
|

Minimum Phone Error model training on merged acoustic units for transcribing bilingual code-switched speech

Abstract: This paper proposes to perform Minimum Phone Error (MPE) model training on merged acoustic units for transcribing Mandarin-English code-switched lectures with highly imbalanced language distribution. Some of the acoustic events in Mandarin and English may have very similar characteristics, so the states or Gaussian mixtures representing them can be merged with identical shared parameters. When MPE is performed afterwards, these merged identical states or Gaussian mixtures can form a compact acoustic unit set. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 11 publications
(10 reference statements)
0
2
0
Order By: Relevance
“…Small corpora have been compiled for English-Spanish [1,2], Cantonese-English [3,4], Hindi-English [5] and for Sepedi-English [6]. However, the language pair English-Mandarin has received by far the most attention [7][8][9][10][11][12][13][14]. Approaches to code-switched language modelling include interpolating n-gram language models (LM) trained on monolingual data [13], n-grams trained on code-switched data [5,7], class-based n-grams using additional features [4], recurrent neural networks [10], and combinations of approaches [11].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Small corpora have been compiled for English-Spanish [1,2], Cantonese-English [3,4], Hindi-English [5] and for Sepedi-English [6]. However, the language pair English-Mandarin has received by far the most attention [7][8][9][10][11][12][13][14]. Approaches to code-switched language modelling include interpolating n-gram language models (LM) trained on monolingual data [13], n-grams trained on code-switched data [5,7], class-based n-grams using additional features [4], recurrent neural networks [10], and combinations of approaches [11].…”
Section: Introductionmentioning
confidence: 99%
“…However, the language pair English-Mandarin has received by far the most attention [7][8][9][10][11][12][13][14]. Approaches to code-switched language modelling include interpolating n-gram language models (LM) trained on monolingual data [13], n-grams trained on code-switched data [5,7], class-based n-grams using additional features [4], recurrent neural networks [10], and combinations of approaches [11]. A particularly relevant recent study considered features for factored language models for Mandarin-English code-switched speech using the SEAME corpus [12].…”
Section: Introductionmentioning
confidence: 99%