Spontaneous speech emotion recognition is a new and challenging research topic. In this paper, we propose a new method of spontaneous speech emotion recognition on the basis of binaural representations and deep convolutional neural networks (CNNs). The proposed method initially employs multiple CNNs to learn deep segment-level binaural representations such as Left-Right and Mid-Side pairs from the extracted image-like Mel-spectrograms. These CNNs are fine-tuned on target emotional speech datasets from a pre-trained image CNN model. Then, a new feature pooling strategy, called block-based temporal feature pooling, is proposed to aggregate the learned segment-level features for producing fixedlength utterance-level features. Based on the utterance-level features, linear support vector machines (SVM) is adopted for emotion classification. Finally, a two-stage score-level fusion strategy is used to integrate the obtained results from Left-Right and Mid-Side pairs. Extensive experiments on two challenging spontaneous emotional speech datasets, including the AFEW5.0 and BAUM-1s databases, demonstrate the effectiveness of our proposed method.INDEX TERMS Spontaneous speech emotion recognition, binaural representations, deep convolutional neural networks, temporal feature pooling.
Air quality forecasting is of great importance in environmental protection, government decision-making, people's daily health, etc. Existing research methods have failed to effectively modeling long-term and complex relationships in time series PM2.5 data and exhibited low precision in long-term prediction. To address this issue, in this paper a new lightweight deep learning model using sparse attention-based Transformer networks (STN) consisting of encoder and decoder layers, in which a multi-head sparse attention mechanism is adopted to reduce the time complexity, is proposed to learn long-term dependencies and complex relationships from time series PM2.5 data for modeling air quality forecasting. Extensive experiments on two real-world datasets in China, i.e., Beijing PM2.5 dataset and Taizhou PM2.5 dataset, show that our proposed method not only has relatively small time complexity, but also outperforms state-of-the-art methods, demonstrating the effectiveness of the proposed STN method on both short-term and long-term air quality prediction tasks. In particular, on singe-step PM2.5 forecasting tasks our proposed method achieves R2 of 0.937 and reduces RMSE to 19.04 µg/m3 and MAE to 11.13 µg/m3 on Beijing PM2.5 dataset. Also, our proposed method obtains R2 of 0.924 and reduces RMSE to 5.79 µg/m3 and MAE to 3.76 µg/m3 on Taizhou PM2.5 dataset. For long-term time step prediction, our proposed method still performs best among all used methods on multi-step PM2.5 forecasting results for the next 6, 12, 24, and 48 h on two real-world datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.