Shiqing Zhang scite author profile

Spontaneous speech emotion recognition is a new and challenging research topic. In this paper, we propose a new method of spontaneous speech emotion recognition on the basis of binaural representations and deep convolutional neural networks (CNNs). The proposed method initially employs multiple CNNs to learn deep segment-level binaural representations such as Left-Right and Mid-Side pairs from the extracted image-like Mel-spectrograms. These CNNs are fine-tuned on target emotional speech datasets from a pre-trained image CNN model. Then, a new feature pooling strategy, called block-based temporal feature pooling, is proposed to aggregate the learned segment-level features for producing fixedlength utterance-level features. Based on the utterance-level features, linear support vector machines (SVM) is adopted for emotion classification. Finally, a two-stage score-level fusion strategy is used to integrate the obtained results from Left-Right and Mid-Side pairs. Extensive experiments on two challenging spontaneous emotional speech datasets, including the AFEW5.0 and BAUM-1s databases, demonstrate the effectiveness of our proposed method.INDEX TERMS Spontaneous speech emotion recognition, binaural representations, deep convolutional neural networks, temporal feature pooling.

show abstract

Speech Emotion Recognition by Combining a Unified First-Order Attention Network With Data Balance

Chen

Zhang

Xin

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Modeling air quality PM2.5 forecasting using deep sparse attention-based transformer networks

Zhang

2023

Int. J. Environ. Sci. Technol.

View full text Add to dashboard Cite

Air quality forecasting is of great importance in environmental protection, government decision-making, people's daily health, etc. Existing research methods have failed to effectively modeling long-term and complex relationships in time series PM2.5 data and exhibited low precision in long-term prediction. To address this issue, in this paper a new lightweight deep learning model using sparse attention-based Transformer networks (STN) consisting of encoder and decoder layers, in which a multi-head sparse attention mechanism is adopted to reduce the time complexity, is proposed to learn long-term dependencies and complex relationships from time series PM2.5 data for modeling air quality forecasting. Extensive experiments on two real-world datasets in China, i.e., Beijing PM2.5 dataset and Taizhou PM2.5 dataset, show that our proposed method not only has relatively small time complexity, but also outperforms state-of-the-art methods, demonstrating the effectiveness of the proposed STN method on both short-term and long-term air quality prediction tasks. In particular, on singe-step PM2.5 forecasting tasks our proposed method achieves R2 of 0.937 and reduces RMSE to 19.04 µg/m3 and MAE to 11.13 µg/m3 on Beijing PM2.5 dataset. Also, our proposed method obtains R2 of 0.924 and reduces RMSE to 5.79 µg/m3 and MAE to 3.76 µg/m3 on Taizhou PM2.5 dataset. For long-term time step prediction, our proposed method still performs best among all used methods on multi-step PM2.5 forecasting results for the next 6, 12, 24, and 48 h on two real-world datasets.

show abstract

Wi-LADL: A Wireless-Based Lightweight Attention Deep Learning Method for Human–Vehicle Recognition

et al. 2023

View full text Add to dashboard Cite

An Environmental Energy Harvesting-Driven Wireless Parking Detection Method: Analysis and Implementation

Lou

Zhou²,

Chen³

et al. 2023

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shiqing Zhang

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition

Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM

Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition

Speech Emotion Recognition by Combining a Unified First-Order Attention Network With Data Balance

Modeling air quality PM2.5 forecasting using deep sparse attention-based transformer networks

Wi-LADL: A Wireless-Based Lightweight Attention Deep Learning Method for Human–Vehicle Recognition

An Environmental Energy Harvesting-Driven Wireless Parking Detection Method: Analysis and Implementation

Contact Info

Product

Resources

About