Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Oruh, Jane; Viriri, Serestina; Adegun, Adekanmi Adeyinka

doi:10.1109/access.2022.3159339

Cited by 65 publications

(23 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of Automatic voice recognition is primarily affected by pooling size rather than pooling layer overlap [2].The CNNbased strategy for voice recognition outperforms the traditional ANN-based approach in terms of accuracy [3].Based on features extracted from the Bark spectrogram, a convolutional neural network (CNN) is utilized to categorize spoken digit recognition data [4].By creating phone classes from raw speech signals, CNN performs better than other parameter-based approaches [3]. A unique technique called the stride-based convolutional neural network (SCNN) reduces the number of convolutional layers and eliminates the pooling layers in order to boost computational stability [5].Using knowledge distillation, it may be possible to convert a BiLSTM model into a low-latency end-to-end UniLSTM model [6].Li-GRU layer significantly lowers computational complexity and boosts recognition performance, saving more than 30% of training time compared to a standard GRU [7].There is a trade off between learning rate and accuracy when trained through LSTM-RNN [8].Using a 15-layer deep network, convolutional LSTMs obtain a word mistake rate of 10.5% without using a dictionary or language model [3].…”

Section: Related Workmentioning

confidence: 99%

“…With the help of this property, LSTMs may process whole times series data sequences without considering each data point separately. Instead, they can process a new data points by using the information from past data in the sequence [23].The previous state of the input sequence is kept in the memory cells of the LSTM.…”

Section: Long Short-term Memory Network(lstm)mentioning

confidence: 99%

See 1 more Smart Citation

Performance Comparison of Various Neural Networks for Speech Recognition

Charan¹,

Kanhe²

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Speech recognition is a method where an audio signal is translated into text, words, or commands and also tells how the speech is recognized. Recently, many deep learning models have been adopted for automatic speech recognition and proved more effective than traditional machine learning methods like Artificial Neural Networks(ANN). This work examines the efficient learning architectures of features by different deep neural networks. In this paper, five neural network models, namely, CNN, LSTM, Bi-LSTM, GRU, and CONV-LSTM, for the comparative study. We trained the networks using Audio MNIST dataset for three different iterations and evaluated them based on performance metrics. Experimentally, CNN and Conv-LSTM network model consistently offers the best performance based on MFCC Features.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Long Short-term Memory Network(lstm)mentioning

confidence: 99%

Performance Comparison of Various Neural Networks for Speech Recognition

Charan¹,

Kanhe²

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

show abstract

“…ey should enhance their design capabilities, create their intellectual property rights, and shorten product lead time and new product launch cycle to reduce customers' costs and improve their sales efficiency. [19].…”

Section: Implementation Strategymentioning

confidence: 99%

The Application of Cloud Computing Service and Voicematic Identification Intelligence in the Supply Chain during the COVID-19 Pandemic

Song

Lai³

2022

Scientific Programming

View full text Add to dashboard Cite

After the outbreak of the COVID-19 pandemic, cloud computing and voice recognition services have provided a more critical role in the enterprise supply chain management process. Speech emotion recognition technology can identify the content and emotion of the speaker according to the content and tone of speech. Analyzing the attitude and tone behind a statement through language and high-level information acquisition is very difficult. Cloud computing technology is becoming more widely used each day and is the cornerstone of enterprise information development. The anti-globalization caused by COVID-19 has made enterprises pay increasing attention to supply chain management. Determining how to optimize and integrate the supply chain has become an urgent problem for enterprises. Based on the work and research completed in this field, this paper first analyzes the algorithm and model used in speech recognition and then tests the system. To improve the utilization rate of enterprise resources and enterprise industrial benefits, this paper puts forward the optimization scheme of the enterprise supply chain from the perspective infrastructure construction to product development to backstage management and procurement.

show abstract

“…It has been verified that the LSTM can better deal with gradient disappearance and explosion caused by long-term dependence. 27 As a result, the LSTM has gained widespread interest in natural language processing, 28 speech recognition, 29 and industrial process time series data. 30 Since the input of the LSTM focuses only on the dynamic information on process variables, the introduction of quality information into the cell structure to enhance the dynamic extraction of quality information is not considered, which makes the correlation between the hidden state and dynamic quality information on the LSTM network biased.…”

Section: Introductionmentioning

confidence: 99%

Supervised Attention-Based Bidirectional Long Short-Term Memory Network for Nonlinear Dynamic Soft Sensor Application

et al. 2023

View full text Add to dashboard Cite

Soft sensors are mathematical methods that describe the dependence of primary variables on secondary variables. A nonlinear characteristic commonly appears in modern industrial process data with increasing complexity and dynamics, which has brought challenges to soft sensor modeling. To solve these issues, a novel supervised attention-based bidirectional long short-term memory (SA-BiLSTM) is first proposed in this paper to handle the nonlinear industrial process modeling with dynamic features. In this SA-BiLSTM model, an attention mechanism is introduced to calculate the correlation between hidden features in each time step, thus avoiding the loss of important information. Furthermore, this approach combines historical quality information and a moving window through a supervised strategy of quality variables. Such manipulation not only extracts and exploits nonlinear dynamic latent information from the process and quality variables but also enhances the model's learning efficiency and overall prediction performance. Finally, two real industrial examples demonstrate the superiority of the proposed method compared to conventional methods.

show abstract

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Cited by 65 publications

References 42 publications

Performance Comparison of Various Neural Networks for Speech Recognition

Performance Comparison of Various Neural Networks for Speech Recognition

The Application of Cloud Computing Service and Voicematic Identification Intelligence in the Supply Chain during the COVID-19 Pandemic

Supervised Attention-Based Bidirectional Long Short-Term Memory Network for Nonlinear Dynamic Soft Sensor Application

Contact Info

Product

Resources

About