ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683664
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-trained Neural Network Acoustic Models

Abstract: In this work, three lattice-free (LF) discriminative training criteria for purely sequence-trained neural network acoustic models are compared on LVCSR tasks, namely maximum mutual information (MMI), boosted maximum mutual information (bMMI) and state-level minimum Bayes risk (sMBR). We demonstrate that, analogous to LF-MMI, a neural network acoustic model can also be trained from scratch using LF-bMMI or LF-sMBR criteria respectively without the need of cross-entropy pre-training. Furthermore, experimental re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…Third, the initialization of speaker embeddings should be explored with more advanced speaker diarization techniques [10,37,38]. Finally, advanced ASR techniques, such as data augmentation [39][40][41][42], model ensemble [43][44][45], improved training criterion [46,47], will also improve overall performance. We will explore these directions for future work.…”
Section: Discussionmentioning
confidence: 99%
“…Third, the initialization of speaker embeddings should be explored with more advanced speaker diarization techniques [10,37,38]. Finally, advanced ASR techniques, such as data augmentation [39][40][41][42], model ensemble [43][44][45], improved training criterion [46,47], will also improve overall performance. We will explore these directions for future work.…”
Section: Discussionmentioning
confidence: 99%
“…LF-MMI [5] criterion was extended to include boosting [22] in [23,24]. Here, we present it again in the generalized hybrid model framework for different modeling units and label topologies.…”
Section: Lf-bmmi Trainingmentioning
confidence: 99%
“…But implementation-wise, in the lattice-free training framework, it is easiest to define this as a sum of per-frame accuracy values. Therefore, as in [24], we use numerator posterior derived from the numerator graph as a proxy for the perframe state-level accuracy values. Besides, the intuition of boosted MMI can also be interpreted by Max-Margin learning [25] [26].…”
Section: Lf-bmmi Trainingmentioning
confidence: 99%
“…Additionally the language model (LM) has been simplified by limiting the LM context to phone-level four-gram which allows for more frequent recombination of state paths. This approach has been adopted and adapted to different sequence criteria [12,13], score-fusion and system combination [14,15] and settings where we completely dispense with the need of initial GMM models [16].…”
Section: Introductionmentioning
confidence: 99%