2013 IEEE Workshop on Automatic Speech Recognition and Understanding 2013
DOI: 10.1109/asru.2013.6707741
|View full text |Cite
|
Sign up to set email alerts
|

Semi-supervised training of Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
289
0
6

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 253 publications
(299 citation statements)
references
References 21 publications
2
289
0
6
Order By: Relevance
“…The logarithm diverges if the argument goes to zero, i.e., if the correct word sequence has zero probability in decoding. To avoid numerical issues with such utterances, we use the frame rejection heuristic described in [13], i.e., discard frames with state occupancy close to zero, γ (den) ut (s) < (here, = 0.001). No regularization (for example, 2-regularization around the initial network) or smoothing such as the H-criterion [12] is used in this paper as there is no empirical evidence for overfitting.…”
Section: Deep Neural Network In Asrmentioning
confidence: 99%
See 2 more Smart Citations
“…The logarithm diverges if the argument goes to zero, i.e., if the correct word sequence has zero probability in decoding. To avoid numerical issues with such utterances, we use the frame rejection heuristic described in [13], i.e., discard frames with state occupancy close to zero, γ (den) ut (s) < (here, = 0.001). No regularization (for example, 2-regularization around the initial network) or smoothing such as the H-criterion [12] is used in this paper as there is no empirical evidence for overfitting.…”
Section: Deep Neural Network In Asrmentioning
confidence: 99%
“…Directly minimizing the word error is a hard optimization problem and thus, several surrogates have been proposed, including maximum mutual information (MMI) [8], minimum phone error (MPE) [9] or state-level minimum Bayes risk (sMBR) [10]. Good gains have recently been reported for sequence training of DNNs [10,11,12,13].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our experience suggests that around 2% absolute WER is gained by expanding the training set to 700hrs by increasing the MER threshold to 40%. We also show the results from applying a standard DNN training recipe with CE training followed by sMBR sequence training [20]. Two iterations of CE training are used, with state alignments regenerated after the first iteration.…”
Section: Baseline Systemmentioning
confidence: 99%
“…The deep neural network (DNN) has 7 layers, each with 2048 units and it employs 5 frames of left and right context in the input frame (i.e., 11 × 40 = 440 units). It is trained using standard restricted Boltzmann machine pre-training, cross entropy training and sequence discriminative training using the state-level minimum Bayes' risk criterion (Veselý et al, 2013).…”
Section: Speech Recognitionmentioning
confidence: 99%