2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) 2021
DOI: 10.1109/icsima50015.2021.9526320
|View full text |Cite
|
Sign up to set email alerts
|

Stride Based Convolutional Neural Network for Speech Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…The performance of Automatic voice recognition is primarily affected by pooling size rather than pooling layer overlap [2].The CNNbased strategy for voice recognition outperforms the traditional ANN-based approach in terms of accuracy [3].Based on features extracted from the Bark spectrogram, a convolutional neural network (CNN) is utilized to categorize spoken digit recognition data [4].By creating phone classes from raw speech signals, CNN performs better than other parameter-based approaches [3]. A unique technique called the stride-based convolutional neural network (SCNN) reduces the number of convolutional layers and eliminates the pooling layers in order to boost computational stability [5].Using knowledge distillation, it may be possible to convert a BiLSTM model into a low-latency end-to-end UniLSTM model [6].Li-GRU layer significantly lowers computational complexity and boosts recognition performance, saving more than 30% of training time compared to a standard GRU [7].There is a trade off between learning rate and accuracy when trained through LSTM-RNN [8].Using a 15-layer deep network, convolutional LSTMs obtain a word mistake rate of 10.5% without using a dictionary or language model [3].…”
Section: Related Workmentioning
confidence: 99%
“…The performance of Automatic voice recognition is primarily affected by pooling size rather than pooling layer overlap [2].The CNNbased strategy for voice recognition outperforms the traditional ANN-based approach in terms of accuracy [3].Based on features extracted from the Bark spectrogram, a convolutional neural network (CNN) is utilized to categorize spoken digit recognition data [4].By creating phone classes from raw speech signals, CNN performs better than other parameter-based approaches [3]. A unique technique called the stride-based convolutional neural network (SCNN) reduces the number of convolutional layers and eliminates the pooling layers in order to boost computational stability [5].Using knowledge distillation, it may be possible to convert a BiLSTM model into a low-latency end-to-end UniLSTM model [6].Li-GRU layer significantly lowers computational complexity and boosts recognition performance, saving more than 30% of training time compared to a standard GRU [7].There is a trade off between learning rate and accuracy when trained through LSTM-RNN [8].Using a 15-layer deep network, convolutional LSTMs obtain a word mistake rate of 10.5% without using a dictionary or language model [3].…”
Section: Related Workmentioning
confidence: 99%
“…Although the methods discussed above can achieve good accuracy, most require artificial feature engineering, which involves massive human professional knowledge and is usually a very time-consuming process. In recent years, deep learning has made significant progress in machine learning and has been widely applied in various fields such as digital image recognition, speech recognition, and steganography analysis [20][21][22][23][24]. Among them, Xu et al [20] proposed a multiscale attention network for splicing tampering forensics in image recognition, which utilizes the integration of residual attention and multiscale information in order to improve the detection accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Lang et al [21] applied deep learning technology to the field of industrial defect detection, aiming to improve the accuracy of magnetic flux leakage (MFL) image recognition of pipeline corrosion defects, and achieved remarkable results. Taiba et al [22] proposed a stride-based convolutional neural network (SCNN) model for speech emotion recognition. Banerjee et al [23] applied deep learning in bio-signal steganography to provide robust, undetectable and trustworthy information security techniques.…”
Section: Introductionmentioning
confidence: 99%