Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1747
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Neural Network Features for Improved Tandem Acoustic Modeling

Abstract: The combination of acoustic models or features is a standard approach to exploit various knowledge sources. This paper investigates the concatenation of different bottleneck (BN) neural network (NN) outputs for tandem acoustic modeling. Thus, combination of NN features is performed via Gaussian mixture models (GMM). Complementarity between the NN feature representations is attained by using various network topologies: LSTM recurrent, feed-forward, and hierarchical, as well as different non-linearities: hyperbo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 44 publications
0
7
0
Order By: Relevance
“…Our cross validation (CV) set was defined only on the Switchboard part, randomly selecting around 10% of the recordings. The details of the speaker adaptive and discriminatively (MPE-SA) trained acoustic model (AM) are described in [18]. Perplexities (PPL) and recognition results are reported on the complete Hub5 2000 (Hub5'00) test set.…”
Section: Experimental Setup 21 Speech Corporamentioning
confidence: 99%
See 1 more Smart Citation
“…Our cross validation (CV) set was defined only on the Switchboard part, randomly selecting around 10% of the recordings. The details of the speaker adaptive and discriminatively (MPE-SA) trained acoustic model (AM) are described in [18]. Perplexities (PPL) and recognition results are reported on the complete Hub5 2000 (Hub5'00) test set.…”
Section: Experimental Setup 21 Speech Corporamentioning
confidence: 99%
“…The task focused on recognizing German Skype conversations. The speaker adaptive and sequence discriminative AMs were trained according to [18]. For this task, we train language models on a corpus of 1-billion words covering 11 different domains [25].…”
Section: Experimental Setup 21 Speech Corporamentioning
confidence: 99%
“…The rest of the transcriptions, which amounts to 26.7 M running words, are used as training data for all language models: both the 4-gram Kneser Ney count model (KN4) [23] and neural models. This selection is the same as in [24]. A vocabulary size of 30K is used.…”
Section: Data Descriptionmentioning
confidence: 99%
“…However, the gap disappears after distillation. Our baseline ASR setup is based on the system presented in [24]. From that system, we only use one acoustic model based on a 5-layer bidirectional LSTM-RNN with 500 nodes in each layer and the rescoring pipeline is simplified by applying the lattice rescoring only once using a single neural LM.…”
Section: Mlp Vs Cnn As the Student Modelmentioning
confidence: 99%
“…Main experiments are conducted on the 300 hour Switchboard English conversational telephone speech task [12] being the most studied ASR benchmark today [2,3,19,21,[25][26][27]. We used Switchboard-1 Release 2 (LDC97S62) as the training set.…”
Section: Datasetsmentioning
confidence: 99%