2022
DOI: 10.1109/access.2022.3159339
|View full text |Cite
|
Sign up to set email alerts
|

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Abstract: Automatic speech recognition (ASR) is one of the utmost demanding tasks in Natural Language Processing due to its complexity. Recently, deep learning approaches have been deployed for this task, and have been proven to outperform traditional machine learning approaches such as ANN. Particularly; deep learning methods such as Long Short-Term Memory (LSTM) has achieved improved performance in ASR. However, this method is limited in processing continuous input streams. Traditional LSTM requires 4 linear layers (M… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 65 publications
(23 citation statements)
references
References 42 publications
0
23
0
Order By: Relevance
“…The performance of Automatic voice recognition is primarily affected by pooling size rather than pooling layer overlap [2].The CNNbased strategy for voice recognition outperforms the traditional ANN-based approach in terms of accuracy [3].Based on features extracted from the Bark spectrogram, a convolutional neural network (CNN) is utilized to categorize spoken digit recognition data [4].By creating phone classes from raw speech signals, CNN performs better than other parameter-based approaches [3]. A unique technique called the stride-based convolutional neural network (SCNN) reduces the number of convolutional layers and eliminates the pooling layers in order to boost computational stability [5].Using knowledge distillation, it may be possible to convert a BiLSTM model into a low-latency end-to-end UniLSTM model [6].Li-GRU layer significantly lowers computational complexity and boosts recognition performance, saving more than 30% of training time compared to a standard GRU [7].There is a trade off between learning rate and accuracy when trained through LSTM-RNN [8].Using a 15-layer deep network, convolutional LSTMs obtain a word mistake rate of 10.5% without using a dictionary or language model [3].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The performance of Automatic voice recognition is primarily affected by pooling size rather than pooling layer overlap [2].The CNNbased strategy for voice recognition outperforms the traditional ANN-based approach in terms of accuracy [3].Based on features extracted from the Bark spectrogram, a convolutional neural network (CNN) is utilized to categorize spoken digit recognition data [4].By creating phone classes from raw speech signals, CNN performs better than other parameter-based approaches [3]. A unique technique called the stride-based convolutional neural network (SCNN) reduces the number of convolutional layers and eliminates the pooling layers in order to boost computational stability [5].Using knowledge distillation, it may be possible to convert a BiLSTM model into a low-latency end-to-end UniLSTM model [6].Li-GRU layer significantly lowers computational complexity and boosts recognition performance, saving more than 30% of training time compared to a standard GRU [7].There is a trade off between learning rate and accuracy when trained through LSTM-RNN [8].Using a 15-layer deep network, convolutional LSTMs obtain a word mistake rate of 10.5% without using a dictionary or language model [3].…”
Section: Related Workmentioning
confidence: 99%
“…With the help of this property, LSTMs may process whole times series data sequences without considering each data point separately. Instead, they can process a new data points by using the information from past data in the sequence [23].The previous state of the input sequence is kept in the memory cells of the LSTM.…”
Section: Long Short-term Memory Network(lstm)mentioning
confidence: 99%
“…ey should enhance their design capabilities, create their intellectual property rights, and shorten product lead time and new product launch cycle to reduce customers' costs and improve their sales efficiency. [19].…”
Section: Implementation Strategymentioning
confidence: 99%
“…It has been verified that the LSTM can better deal with gradient disappearance and explosion caused by long-term dependence. 27 As a result, the LSTM has gained widespread interest in natural language processing, 28 speech recognition, 29 and industrial process time series data. 30 Since the input of the LSTM focuses only on the dynamic information on process variables, the introduction of quality information into the cell structure to enhance the dynamic extraction of quality information is not considered, which makes the correlation between the hidden state and dynamic quality information on the LSTM network biased.…”
Section: Introductionmentioning
confidence: 99%