Vijayaditya Peddinti scite author profile

In this paper we describe a method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training. We use the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI. To make its computation feasible we use a phone n-gram language model, in place of the word language model. To further reduce its space and time complexity we compute the objective function using neural network outputs at one third the standard frame rate. These changes enable us to perform the computation for the forward-backward algorithm on GPUs. Further the reduced output frame-rate also provides a significant speed-up during decoding. We present results on 5 different LVCSR tasks with training data ranging from 100 to 2100 hours. Models trained with LF-MMI provide a relative word error rate reduction of ∼11.5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions. A further reduction of ∼2.5%, relative, can be obtained by fine tuning these models with the word-lattice based sMBR objective function.

show abstract

Audio augmentation for speech recognition

Ko¹,

Peddinti²,

Povey

et al. 2015

743

359

View full text Add to dashboard Cite

A study on data augmentation of reverberant speech for robust speech recognition

et al. 2017

View full text Add to dashboard Cite

A time delay neural network architecture for efficient modeling of long temporal contexts

Peddinti

Povey

Khudanpur³

2015

640

271

View full text Add to dashboard Cite

Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs

Peddinti

Wang

Povey

et al. 2018

IEEE Signal Process. Lett.

160

143

View full text Add to dashboard Cite

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

Jansen¹,

Dupoux

Goldwater

et al. 2013

112

View full text Add to dashboard Cite

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.

show abstract

An Exploration of Dropout with LSTMs

Cheng

Peddinti

Povey

et al. 2017

View full text Add to dashboard Cite

Long Short-Term Memory networks (LSTMs) are a component of many state-of-the-art DNN-based speech recognition systems. Dropout is a popular method to improve generalization in DNN training. In this paper we describe extensive experiments in which we investigated the best way to combine dropout with LSTMs-specifically, projected LSTMs (LSTMP). We investigated various locations in the LSTM to place the dropout (and various combinations of locations), and a variety of dropout schedules. Our optimized recipe gives consistent improvements in WER across a range of datasets, including Switchboard, TED-LIUM and AMI.Projected LSTMs (LSTMPs) [4] are an important component of our baseline system, and to provide context for our explanation of dropout we will repeat the equations for them; here xt is the

show abstract

Evaluating speech features with the minimal-pair ABX task: analysis of the classical MFC/PLP pipeline

Schatz¹,

Peddinti²,

Bach³

et al. 2013

View full text Add to dashboard Cite

We present a new framework for the evaluation of speech representations in zero-resource settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1]. In particular, we replace their Same/Different discrimination task by several Minimal-Pair ABX (MP-ABX) tasks. We explain the analytical advantages of this new framework and apply it to decompose the standard signal processing pipelines for computing PLP and MFC coefficients. This method enables us to confirm and quantify a variety of well-known and not-so-well-known results in a single framework.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.