Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-632
|View full text |Cite
|
Sign up to set email alerts
|

The IBM 2015 English conversational telephone speech recognition system

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
31
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 44 publications
(33 citation statements)
references
References 23 publications
2
31
0
Order By: Relevance
“…We remind the reader that maxout nets [12] generalize ReLu units by employing non-linearities of the form si = max j∈C(i) w T j x + bj where the subsets of neurons C(i) are typically disjoint. In [11] we have shown that maxout DNNs and CNNs trained with annealed dropout outperform their sigmoid-based counterparts on both 300 hours and 2000 hours training regimes. What was missing there was a comparison between maxout and sigmoid for unfolded RNNs [4].…”
Section: Recurrent Nets With Maxout Activationsmentioning
confidence: 95%
See 3 more Smart Citations
“…We remind the reader that maxout nets [12] generalize ReLu units by employing non-linearities of the form si = max j∈C(i) w T j x + bj where the subsets of neurons C(i) are typically disjoint. In [11] we have shown that maxout DNNs and CNNs trained with annealed dropout outperform their sigmoid-based counterparts on both 300 hours and 2000 hours training regimes. What was missing there was a comparison between maxout and sigmoid for unfolded RNNs [4].…”
Section: Recurrent Nets With Maxout Activationsmentioning
confidence: 95%
“…The decodings are done with a small vocabulary of 30K words and a small 4-gram language model with 4M n-grams. Note that the sigmoid RNNs have better error rates than what was reported in [11] because they have been retrained after the data has been realigned with the best joint RNN/CNN model. We observe that the maxout RNNs are consistently better and that, by themselves, they achieve a similar WER as our previous best model which was the joint RNN/CNN with sigmoid activations.…”
Section: Recurrent Nets With Maxout Activationsmentioning
confidence: 96%
See 2 more Smart Citations
“…The training is accomplished using the IBM Attila toolkit [24] on 600 hours of conversational telephone speech (CTS) data from the Fisher corpus [25] with a 9-frame context of 40-dimensional speaker-adapted feature vectors obtained using per recording fMLLR transforms [16,17]. The fM-LLR transforms are generated for each recording with decoding alignments obtained from a GMM-HMM acoustic model (see [26,27] for more details).…”
Section: Dnn System Configurationmentioning
confidence: 99%