Stride Based Convolutional Neural Network for Speech Emotion Recognition

Wani, Taiba Majid; Gunawan, Teddy Surya; Qadri, Syed Asif Ahmad; Mansor, Hasmah; Arifin, Fatchul; Ahmad, Yasser Asrul

doi:10.1109/icsima50015.2021.9526320

Cited by 8 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of Automatic voice recognition is primarily affected by pooling size rather than pooling layer overlap [2].The CNNbased strategy for voice recognition outperforms the traditional ANN-based approach in terms of accuracy [3].Based on features extracted from the Bark spectrogram, a convolutional neural network (CNN) is utilized to categorize spoken digit recognition data [4].By creating phone classes from raw speech signals, CNN performs better than other parameter-based approaches [3]. A unique technique called the stride-based convolutional neural network (SCNN) reduces the number of convolutional layers and eliminates the pooling layers in order to boost computational stability [5].Using knowledge distillation, it may be possible to convert a BiLSTM model into a low-latency end-to-end UniLSTM model [6].Li-GRU layer significantly lowers computational complexity and boosts recognition performance, saving more than 30% of training time compared to a standard GRU [7].There is a trade off between learning rate and accuracy when trained through LSTM-RNN [8].Using a 15-layer deep network, convolutional LSTMs obtain a word mistake rate of 10.5% without using a dictionary or language model [3].…”

Section: Related Workmentioning

confidence: 99%

Performance Comparison of Various Neural Networks for Speech Recognition

Charan¹,

Kanhe²

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Speech recognition is a method where an audio signal is translated into text, words, or commands and also tells how the speech is recognized. Recently, many deep learning models have been adopted for automatic speech recognition and proved more effective than traditional machine learning methods like Artificial Neural Networks(ANN). This work examines the efficient learning architectures of features by different deep neural networks. In this paper, five neural network models, namely, CNN, LSTM, Bi-LSTM, GRU, and CONV-LSTM, for the comparative study. We trained the networks using Audio MNIST dataset for three different iterations and evaluated them based on performance metrics. Experimentally, CNN and Conv-LSTM network model consistently offers the best performance based on MFCC Features.

show abstract

Section: Related Workmentioning

confidence: 99%

Performance Comparison of Various Neural Networks for Speech Recognition

Charan¹,

Kanhe²

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

show abstract

“…Although the methods discussed above can achieve good accuracy, most require artificial feature engineering, which involves massive human professional knowledge and is usually a very time-consuming process. In recent years, deep learning has made significant progress in machine learning and has been widely applied in various fields such as digital image recognition, speech recognition, and steganography analysis [20][21][22][23][24]. Among them, Xu et al [20] proposed a multiscale attention network for splicing tampering forensics in image recognition, which utilizes the integration of residual attention and multiscale information in order to improve the detection accuracy.…”

Section: Introductionmentioning

confidence: 99%

“…Lang et al [21] applied deep learning technology to the field of industrial defect detection, aiming to improve the accuracy of magnetic flux leakage (MFL) image recognition of pipeline corrosion defects, and achieved remarkable results. Taiba et al [22] proposed a stride-based convolutional neural network (SCNN) model for speech emotion recognition. Banerjee et al [23] applied deep learning in bio-signal steganography to provide robust, undetectable and trustworthy information security techniques.…”

Section: Introductionmentioning

confidence: 99%

Pyramid Feature Attention Network for Speech Resampling Detection

Zhou,

Zhang,

Wang

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.

show abstract

Emotion Recognition of Speech

Sacheth¹,

Jayashree²

2023

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

Stride Based Convolutional Neural Network for Speech Emotion Recognition

Cited by 8 publications

References 14 publications

Performance Comparison of Various Neural Networks for Speech Recognition

Performance Comparison of Various Neural Networks for Speech Recognition

Pyramid Feature Attention Network for Speech Resampling Detection

Emotion Recognition of Speech

Contact Info

Product

Resources

About