Improving deep neural network acoustic models using generalized maxout networks

Zhang, Xiaohui; Trmal, Jan; Povey, Daniel; Khudanpur, Sanjeev

doi:10.1109/icassp.2014.6853589

Cited by 240 publications

(144 citation statements)

References 15 publications

Supporting

Mentioning

141

Contrasting

Unclassified

Order By: Relevance

“…We use the Kaldi speech recognition tools (Povey et al, 2011) to build our Spanish ASR systems. Our state-of-the-art ASR system is the p-norm DNN system of (Zhang et al, 2014). The worderror-rates on the dev and test sets of the Fisher dataset (dev, dev-2, test) are 29.80%, 29.79% and 25.30% respectively.…”

Section: Resultsmentioning

confidence: 99%

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation

Kumar

Blackwood

Trmal

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Speech translation is conventionally carried out by cascading an automatic speech recognition (ASR) and a statistical machine translation (SMT) system. The hypotheses chosen for translation are based on the ASR system's acoustic and language model scores, and typically optimized for word error rate, ignoring the intended downstream use: automatic translation. In this paper, we present a coarseto-fine model that uses features from the ASR and SMT systems to optimize this coupling. We demonstrate that several standard features utilized by ASR and SMT systems can be used in such a model at the speech-translation interface, and we provide empirical results on the Fisher Spanish-English speech translation corpus.

show abstract

Section: Resultsmentioning

confidence: 99%

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation

Kumar

Blackwood

Trmal

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the language model we used a pruned version of the standard trigram language model that is distributed with the WSJ corpus. The acoustic models, referred to as SAT in the tables, are speaker-adapted GMM models [18,19], and those referred to as DNN, are based on deep neural networks with p-norm non-linearities [23], trained and tested on top of fMLLR features. The models estimated on LibriSpeech's training data are named after the amount of audio they were built on.…”

Section: Methodsmentioning

confidence: 99%

Librispeech: An ASR corpus based on public domain audio books

Panayotov

Chen

Povey

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

4,180

2,452

View full text Add to dashboard Cite

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

show abstract

“…Variations of ReLU, such as leaky ReLU [41], parametric ReLU [42], and exponential LU [43] have also been explored for improved accuracy. Finally, a non-linearity called maxout, which takes the max value of two intersecting linear functions, has shown to be effective in speech recognition tasks [44,45].…”

Section: A Convolutional Neural Network (Cnns)mentioning

confidence: 99%

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

et al. 2017

View full text Add to dashboard Cite

Abstract-Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

show abstract

Improving deep neural network acoustic models using generalized maxout networks

Cited by 240 publications

References 15 publications

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation

Librispeech: An ASR corpus based on public domain audio books

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Contact Info

Product

Resources

About