Highway long short-term memory RNNS for distant speech recognition

Zhang, Yu; Chen, Guoguo; Yu, Dong; Yaco, Kaisheng; Khudanpur, Sanjeev; Glass, James

doi:10.1109/icassp.2016.7472780

Cited by 264 publications

(221 citation statements)

References 18 publications

Supporting

Mentioning

216

Contrasting

Order By: Relevance

“…A recent study in advanced acoustic modeling using deep long short-term memory (LSTM) recurrent neural networks reported significant improvement for AMI's single distant microphone (SDM) task with 47.7% WER, even though it does not consider multi-channel inputs [51]. This work may not be directly compared with our results since it used sequence discriminative training with dropout and DNN to force align the training data to generate labels for LSTM training.…”

Section: Multi-channel Integration In Acoustic Modelingcontrasting

confidence: 47%

Feature mapping using far-field microphones for distant speech recognition

Himawan

Motlíček

Sridharan

2016

Speech Communication

View full text Add to dashboard Cite

Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features trained on clean signal are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling.

show abstract

Section: Multi-channel Integration In Acoustic Modelingcontrasting

confidence: 47%

Feature mapping using far-field microphones for distant speech recognition

Himawan

Motlíček

Sridharan

2016

Speech Communication

View full text Add to dashboard Cite

show abstract

“…Highway Connections To alleviate the vanishing gradient problem when training deep BiLSTMs, we use gated highway connections (Zhang et al, 2016;Srivastava et al, 2015). We include transform gates r t to control the weight of linear and non-linear transformations between layers (See Figure 1).…”

Section: Deep Bilstm Modelmentioning

confidence: 99%

“…Following Zhou and Xu (2015), we treat SRL as a BIO tagging problem and use deep bidirectional LSTMs. However, we differ by (1) simplifying the input and output layers, (2) introducing highway connections (Srivastava et al, 2015;Zhang et al, 2016), (3) using recurrent dropout (Gal and Ghahramani, 2016), (4) decoding with BIOconstraints, and (5) ensembling with a product of experts. Our model gives a 10% relative error reduction over previous state of the art on the test sets of CoNLL 2005 and 2012.…”

Section: Introductionmentioning

confidence: 99%

Deep Semantic Role Labeling: What Works and What’s Next

He¹,

Lee²,

Lewis³

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

368

371

View full text Add to dashboard Cite

We introduce a new deep learning model for semantic role labeling (SRL) that significantly improves the state of the art, along with detailed analyses to reveal its strengths and limitations. We use a deep highway BiLSTM architecture with constrained decoding, while observing a number of recent best practices for initialization and regularization. Our 8-layer ensemble model achieves 83.2 F1 on the CoNLL 2005 test set and 83.4 F1 on CoNLL 2012, roughly a 10% relative error reduction over the previous state of the art. Extensive empirical analysis of these gains show that (1) deep models excel at recovering long-distance dependencies but can still make surprisingly obvious errors, and (2) that there is still room for syntactic parsers to improve these results.

show abstract

“…This includes context-sensitive-chunk BLSTM (CSC-BLSTM) [25] and latency-controlled BLSTM (LC-BLSTM) [26]. Figure 2 shows the differences among these approaches.…”

Section: Local Window Blstmmentioning

confidence: 99%

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Kang

Zhang

Liu

et al. 2018

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem and have achieved excellent results. Bidirectional LSTM (BLSTM), which uses both preceding and following context, has shown particularly good performance. However, the computational requirements of BLSTM approaches are quite heavy, even when implemented efficiently with GPU-based high performance computers. In addition, because the output of LSTM units is bounded, there is often still a vanishing gradient issue over multiple layers. The large size of LSTM networks makes them susceptible to overfitting problems. In this work, we combine local bidirectional architecture, a new recurrent unit, gated recurrent units (GRU), and residual architectures to address the above problems. Experiments are conducted on the benchmark datasets released under the IARPA Babel Program. The proposed models achieve 3 to 10% relative improvements over their corresponding DNN or LSTM baselines across seven language collections. In addition, the new models accelerate learning speed by a factor of more than 1.6 compared to conventional BLSTM models. By using these approaches, we achieve good results in the IARPA Babel Program.

show abstract

Highway long short-term memory RNNS for distant speech recognition

Cited by 264 publications

References 18 publications

Feature mapping using far-field microphones for distant speech recognition

Feature mapping using far-field microphones for distant speech recognition

Deep Semantic Role Labeling: What Works and What’s Next

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Contact Info

Product

Resources

About