2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404851
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic modelling with CD-CTC-SMBR LSTM RNNS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
7
1
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 36 publications
(6 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…Traditional hybrid DNN/Hidden Markov Model (HMM) approach utilizes a neural network to produce a posterior distribution over tied HMM states [14,15] for each acoustic frame, usually followed by sequence discriminative training to boost performance [16]. CTC [17] has became an alternative criterion to frame-level cross-entropy (CE) training or sequencelevel lattice-free MMI (LF-MMI) training in recent years and has shown promising results [18][19][20][21][22]. Inspired by the rise of end-to-end training in machine translation, encoder-decoder architecture was also introduced for ASR, e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Traditional hybrid DNN/Hidden Markov Model (HMM) approach utilizes a neural network to produce a posterior distribution over tied HMM states [14,15] for each acoustic frame, usually followed by sequence discriminative training to boost performance [16]. CTC [17] has became an alternative criterion to frame-level cross-entropy (CE) training or sequencelevel lattice-free MMI (LF-MMI) training in recent years and has shown promising results [18][19][20][21][22]. Inspired by the rise of end-to-end training in machine translation, encoder-decoder architecture was also introduced for ASR, e.g.…”
Section: Introductionmentioning
confidence: 99%
“…As for end-to-end models for child ASR, Andrew et al (2015) show improvement on child speech with a CTCbased system jointly trained on very large quantities of mixed adult and child speech data. Usage of seq2seq models for child speech recognition is a new research subject, as show the extremely recent communication of technical reports on this matter (Ng et al, 2020;Chen et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…They do not necessarily lead to minimized recognition error rate in LVCSR tasks. Therefore, many discriminatvie training methods such as MCE [25], maximum mutual information (MMI) [26,27], minimum phone error (MPE) [28], state-level minimum Bayes risk (sMBR) [29,30] and boosted MMI [31] are proposed to further refine the DNN [32] and LSTM [33] acoustic model. For the keyword spotting task based on LVCSR, our goal is to minimize the recognition error on the set of keywords, while the aforementioned methods focus on the minimization of recognition error rate on all possible words which are not suitable for the keyword spotting task.…”
Section: Non-uniform Bmce Training Of Deep Blstm Acoustic Model For Kmentioning
confidence: 99%