2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639014
|View full text |Cite
|
Sign up to set email alerts
|

Multi-level adaptive networks in tandem and hybrid ASR systems

Abstract: In this paper we investigate the use of Multi-level adaptive networks (MLAN) to incorporate out-of-domain data when training large vocabulary speech recognition systems. In a set of experiments on multi-genre broadcast data and on TED lecture recordings we present results using of out-of-domain features in a hybrid DNN system and explore tandem systems using a variety of input acoustic features. Our experiments indicate using the MLAN approach in both hybrid and tandem systems results in consistent reductions … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 36 publications
(28 citation statements)
references
References 31 publications
(42 reference statements)
0
28
0
Order By: Relevance
“…Encouraged by the success of DNNs in the hybrid approach, researchers reevaluated the tandem approach using DNNs and achieved similar performance improvements [3,[14][15][16][17][18][19][20]. Some comparative studies were conducted for the hybrid and tandem approaches, though no evidence supports that one approach clearly outperforms the other [21,22].…”
Section: Introductionmentioning
confidence: 99%
“…Encouraged by the success of DNNs in the hybrid approach, researchers reevaluated the tandem approach using DNNs and achieved similar performance improvements [3,[14][15][16][17][18][19][20]. Some comparative studies were conducted for the hybrid and tandem approaches, though no evidence supports that one approach clearly outperforms the other [21,22].…”
Section: Introductionmentioning
confidence: 99%
“…Speech recognition was performed using a system [1] trained primarily over TED talks as used for the IWSLT 2012 ASR evaluation. The system has two passes of decoding, both using hybrid models in which HMM observation probabilities are computed using a deep neural network.…”
Section: Audio Processing and Speech Recognitionmentioning
confidence: 99%
“…However, it was only found recently that an MLP with a large set of context-dependent targets and many hidden layers, i.e., a context-dependent deep neural network (CD-DNN), could significantly improve recognition performance [3][4][5]. Although CD-DNNs have demonstrated favourable performance in various speech recognition tasks [4][5][6][7][8][9], an existing well-trained traditional GMM-HMM has to be used for two main aspects of training: state-to-frame alignments and defining a set of tied context-dependent states [3,4].…”
Section: Introductionmentioning
confidence: 99%