Ziping Zhao scite author profile

The automatic detection of an emotional state from human speech, which plays a crucial role in the area of human-machine interaction, has consistently been shown to be a difficult task for machine learning algorithms. Previous work on emotion recognition has mostly focused on the extraction of carefully hand-crafted and highly engineered features. Results from these works have demonstrated the importance of discriminative spatio-temporal features to model the continual evolutions of different emotions. Recently, spectrogram representations of emotional speech have achieved competitive performance for automatic speech emotion recognition (SER). How machine learning algorithms learn the effective compositional spatio-temporal dynamics for SER has been a fundamental problem of deep representations, herein denoted as deep spectrum representations. In this paper, we develop a model to alleviate this limitation by leveraging a parallel combination of attention-based bidirectional long short-term memory recurrent neural networks with attention-based fully convolutional networks (FCN). The extensive experiments were undertaken on the interactive emotional dyadic motion capture (IEMOCAP) and FAU aibo emotion corpus (FAU-AEC) to highlight the effectiveness of our approach. The experimental results indicate that deep spectrum representations extracted from the proposed model are well-suited to the task of SER, achieving a WA of 68.1 % and a UA of 67.0 % on IEMOCAP, and 45.4% for UA on FAU-AEC dataset. Key results indicate that the extracted deep representations combined with a linear support vector classifier are comparable in performance with eGeMAPS and COMPARE, two standard acoustic feature representations.

show abstract

Wavelet-based method for removing global physiological noise in functional near-infrared spectroscopy

Duan¹,

Zhao²,

Lin³

et al. 2018

Biomed. Opt. Express

View full text Add to dashboard Cite

Functional near-infrared spectroscopy (fNIRS) is a fast-developing non-invasive functional brain imaging technology widely used in cognitive neuroscience, clinical research and neural engineering. However, it is a challenge to effectively remove the global physiological noise in the fNIRS signal. The global physiological noise in fNIRS arises from multiple physiological origins in both superficial tissues and the brain. It has complex temporal, spatial and frequency characteristics, casting significant influence on the results. In the present study, we developed a novel wavelet-based method for fNIRS global physiological noise removal. The method is data-driven and does not rely on any additional hardware or subjective noise component selection procedure. It consists of two steps. Firstly, we use wavelet transform coherence to automatically detect the time-frequency points contaminated by the global physiological noise. Secondly, we decompose the fNIRS signal by using the wavelet transform, and then suppress the wavelet energy of the contaminated time-frequency points. Finally, we transform the signal back to a time series. We validated the method by using simulation and real data at both task-and resting-state. The results showed that our method can effectively remove the global physiological noise from the fNIRS signal and improve the spatial specificity of the task activation and the resting-state functional connectivity pattern.

show abstract

A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews

Mallol-Ragolta¹,

Zhao²,

Stappen³

et al. 2019

View full text Add to dashboard Cite

The high prevalence of depression in society has given rise to a need for new digital tools that can aid its early detection. Among other effects, depression impacts the use of language. Seeking to exploit this, this work focuses on the detection of depressed and non-depressed individuals through the analysis of linguistic information extracted from transcripts of clinical interviews with a virtual agent. Specifically, we investigated the advantages of employing hierarchical attention-based networks for this task. Using Global Vectors (GloVe) pretrained word embedding models to extract low-level representations of the words, we compared hierarchical local-global attention networks and hierarchical contextual attention networks. We performed our experiments on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WoZ) dataset, which contains audio, visual, and linguistic information acquired from participants during a clinical session. Our results using the DAIC-WoZ test set indicate that hierarchical contextual attention networks are the most suitable configuration to detect depression from transcripts. The configuration achieves an Unweighted Average Recall (UAR) of .66 using the test set, surpassing our baseline, a Recurrent Neural Network that does not use attention.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ziping Zhao

AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition

Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition

Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition

Wavelet-based method for removing global physiological noise in functional near-infrared spectroscopy

A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews

Contact Info

Product

Resources

About