Projected Minimal Gated Recurrent Unit for Speech Recognition

Ran, Feng; Jiang, Weijie; Yu, Ning; Yue, Wu; Yan, Jiaxuan

doi:10.1109/access.2020.3041477

Cited by 4 publications

(3 citation statements)

References 36 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Feng et al [26] proposed a Projected minimal Gated Recurrent Unit (PmGRU) an improved version of mGRUIP-Ctx for speech recognition acoustic model on five different ASR tasks. The proposed model has shown significant reduction in Word Error Rate (WER) compared with the WER of the mGRUIP-Ctx.…”

Section: B Deep Learning Based Methods For Automatic Speech Recogniti...mentioning

confidence: 99%

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

2022

View full text Add to dashboard Cite

Automatic speech recognition (ASR) is one of the utmost demanding tasks in Natural Language Processing due to its complexity. Recently, deep learning approaches have been deployed for this task, and have been proven to outperform traditional machine learning approaches such as ANN. Particularly; deep learning methods such as Long Short-Term Memory (LSTM) has achieved improved performance in ASR. However, this method is limited in processing continuous input streams. Traditional LSTM requires 4 linear layers (MLP layer) per cell, which require large amounts of memory bandwidth to run at and for each sequence time-step. LSTM cannot afford many computational units required in processing continuous input streams because the system does not have enough memory bandwidth to feed the computational units. In this research, an enhanced deep learning LSTM RNN model is proposed to resolve this shortcoming. In the proposed model, a Recurrent Neural Network (RNN) is incorporated as a "forget gate" to the memory block to allow resetting of the cell states at the beginning of sub-sequences. This will enable the system to efficiently process continuous input streams without necessarily increasing the required bandwidths. In the proposed model, the standard architecture of the LSTM networks has been modified to make effective use of the model parameters to address the computational efficiency problems of large networks on large vocabulary speech recognition. Some CNN based models and Sequential models were also used on the same dataset, and the performances of the models were compared with the performance of the proposed model. The LSTM-RNN outperformed the other deep learning models with the accuracy of 99.36% on the well-established public benchmark spoken English digits dataset.

show abstract

Section: B Deep Learning Based Methods For Automatic Speech Recogniti...mentioning

confidence: 99%

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

2022

View full text Add to dashboard Cite

show abstract

“…First, the temporal feature extraction network can extract the variation characteristics of the time series in both temporal and spatial dimensions from multiple dimensions. Second, the feature extraction network based on the gated recurrent unit is more advantageous than the LSTM network in handling large batches of temporal data, which can significantly reduce the training time of the model while ensuring the accuracy of the prediction (Feng et al, 2020). The six kinds of water quality information were passed through the GRU module to adjust the parameter states between the hidden layers.…”

Section: System Structurementioning

confidence: 99%

A dissolved oxygen prediction model based on GRU–N-Beats

Hao

2024

Front. Mar. Sci.

View full text Add to dashboard Cite

Dissolved oxygen is one of the most important water quality parameters in aquaculture, and the level determines whether fish can grow healthily. Since there is a delay in equipment control in the aquaculture environment, dissolved oxygen prediction is needed to reduce the loss due to low dissolved oxygen. To solve the problem of insufficient accuracy and poor interpretability of traditional methods in predicting dissolved oxygen from multivariate water quality parameters, this paper proposes an improved N-Beats-based prediction network. First, the maximum expectation algorithm [expectation–maximization (EM)] was used to fill in the original data by fitting the missing values. Second, the discrete wavelet transform (DWT) was used to reduce the overall noise of the sample, then the gated recurrent unit (GRU) feature extraction network was employed to extract the water quality information from the temporal dimension, the N-Beats was utilized to predict the preprocessed data, and the residual operation through Stack was performed to obtain the prediction results. The improved algorithm overcomes the challenge of insufficient prediction accuracy of the traditional algorithm. The GRU–N-Beats network proposed in this paper can extract features from multivariate time dimensions for prediction. The values of root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and R2 for the proposed algorithm were 0.171, 0.120, 0.015, and 0.97, respectively. In particular, they were 28.5%, 32.1%, 51.6%, 24.3%, 14.9%, 36.4%, and 19.3% higher than those of long short-term memory (LSTM), GRU, temporal convolutional network (TCN), LSTM–TCN, PatchTST, back-propagation neural network (BPNN), and N-Beats on RMSE, respectively.

show abstract

“…W ITH the development of deep neural network technology, no matter in target recognition [1]- [3], object detection [4], semantic segmentation [5], speech recognition [6], [7], or in text translation [8], these learning models based on deep neural networks have achieved significant progress. The success of these models depends on a large quantity of training samples.…”

Section: Introductionmentioning

confidence: 99%

Feature Transformation Network for Few-Shot Learning

Wang

Zhou

2021

IEEE Access

View full text Add to dashboard Cite

Few-shot learning researches to learn a novel concept from a handful of labeled samples. Due to the small amount of training data, deep network has the risk of over-fitting. Although many previous approaches based on metric criterion can make significant progress to tackle this challenge, they not only ignore the association between query set and support set when learning sample representation, but also fail to focus greater attention in the target area. To cope with these issues, we propose a novel feature transformation network (FTN) for few-shot image classification. Specifically, to draw inferences about other instances from only a few examples, it is expected to learn a model that has more discriminative representation of the target attributes and robust generalization ability. To this end, we introduce an attention-based affinity matrix to transform the semantical enhanced embedding vectors of query samples by associating the support set, thereby guiding the network to learn a sample representation that embodies higher semantic information in the target area. Furthermore, aiming at highlighting the object region in the feature maps, and strengthening the pertinence of similarity measurement between samples, a global and local feature fusion module is designed to fuse the support set samples features. The comprehensive experiments validate the doable of our model, and our method achieves the state-of-the-art performance on two public benchmark datasets, namely, general object dataset mini-ImageNet and fine-grained dataset Caltech-UCSD Birds-200-2011 (CUB). INDEX TERMS Few-shot learning, feature transformation, feature fusion, metric criterion.

show abstract

Projected Minimal Gated Recurrent Unit for Speech Recognition

Cited by 4 publications

References 36 publications

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

A dissolved oxygen prediction model based on GRU–N-Beats

Feature Transformation Network for Few-Shot Learning

Contact Info

Product

Resources

About