Interspeech 2013 2013
DOI: 10.21437/interspeech.2013-548
|View full text |Cite
|
Sign up to set email alerts
|

Sequence-discriminative training of deep neural networks

Abstract: Sequence-discriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria -maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI -are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria -lattices are regenerated after the first iteration of training; and, for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
157
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 413 publications
(161 citation statements)
references
References 22 publications
4
157
0
Order By: Relevance
“…The REINFORCE framework [24] enables model training where the probabilities of hypotheses are boosted if they perform well on arbitrary chosen metrics. In [25], sequence-discriminative criteria such as minimum word, phone error or minimum Bayes risk is used for ASR. Similar to mWER training for ASR, [22] showed that training with non-differentiable semantic criteria directly optimized SLU metrics.…”
Section: Transcriptmentioning
confidence: 99%
“…The REINFORCE framework [24] enables model training where the probabilities of hypotheses are boosted if they perform well on arbitrary chosen metrics. In [25], sequence-discriminative criteria such as minimum word, phone error or minimum Bayes risk is used for ASR. Similar to mWER training for ASR, [22] showed that training with non-differentiable semantic criteria directly optimized SLU metrics.…”
Section: Transcriptmentioning
confidence: 99%
“…For all our experiments, we use 2 microphones from the opposite sides of the microphone array. All the models are trained with the cross-entropy objective followed by the sMBR objective function [16]. Our test data consists of 33,000 far-field speech utterances, and our development set consists of 17,000 similar utterances.…”
Section: System Descriptionmentioning
confidence: 99%
“…Next, the features were subsampled by factor 3 and NNs were trained with Lattice Free MMI (LF-MMI) objective and bi-phone targets as suggested in [9]. Finally, the NN are further trained with sequence Minimum Bayes Risk sMBR objective [10].…”
Section: Hybrid Acoustics Modelsmentioning
confidence: 99%