ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683338
|View full text |Cite
|
Sign up to set email alerts
|

An Investigation of Multilingual ASR Using End-to-end LF-MMI

Abstract: The end-to-end lattice-free maximum mutual information (LF-MMI) approach has recently been shown to be beneficial for automatic speech recognition (ASR) in general. More specifically, its end-to-end nature and use of context independent phone labels make it attractive for multilingual ASR. We show that end-to-end LF-MMI is indeed competitive on a low-resourced multilingual task, comfortably outperforming a connectionist temporal classification (CTC) baseline. We further investigate the feasibility of biphone c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 20 publications
0
13
0
Order By: Relevance
“…6 shows framework of the chain model. It uses a method of sequence-discriminative training and the objective function we used in the training is LF-MMI (Lattice-Free Maximum Mutual Information) [37], [38], which aims to maximize the probability of the target sequence, while minimizing the probability of all other sequences:…”
Section: A Asr Modelsmentioning
confidence: 99%
“…6 shows framework of the chain model. It uses a method of sequence-discriminative training and the objective function we used in the training is LF-MMI (Lattice-Free Maximum Mutual Information) [37], [38], which aims to maximize the probability of the target sequence, while minimizing the probability of all other sequences:…”
Section: A Asr Modelsmentioning
confidence: 99%
“…Earlier studies in multilingual and crosslingual recognition use context-dependent phone units, which leads to an explosion of units and also needs special care to handle context-dependent modeling across languages [26,27]. There are recent attempts to use end-toend ASR models such as CTC with monophones [8,25] or end-toend LF-MMI with biphones [28,27] for multilingual and crosslingual recognition. Remarkably, the end-to-end CTC-CRF model, which is defined by a CRF (conditional random field) with CTC topology, has been shown to perform significantly better than CTC [20,21].…”
Section: Related Workmentioning
confidence: 99%
“…The reduced target set modeling refers to employing a lesser number of target labels than those involved in the combined target set modeling based E2E code-switching ASR system. Recently, in the context of the multilingual ASR task [45], the authors successfully used the union of phone sets of the underlying languages as targets to the E2E ASR system instead of the combined character set. Motivated by that, in an earlier work [28], we had defined a common phone set having 62 labels that cover both Hindi and English languages.…”
Section: B Reduced Target Set Modelingmentioning
confidence: 99%