Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech

Yu, Zhou; Ramanarayanan, Vikram; Suendermann-Oeft, David; Wang, Xinhao; Zechner, Klaus; Chen, Lei; Jin, Tao; Ivanou, Aliaksei; Ye, Qian

doi:10.1109/asru.2015.7404814

Cited by 69 publications

(32 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we will briefly overview the preliminary work of bidirectional long short term memory (BiLSTM) network [25] and then address how we can apply it to the EEG feature extraction task.…”

Section: Preliminarymentioning

confidence: 99%

“…However, one shortcoming of conventional LSTM is that it only make use of the previous context. The BiLSTM module is able to process data using two directions with separate hidden layers, respectively [25]. As a result, compared with the traditional LSTM model, BiLSTM can access the long-range context in both input directions, and hence it could be better used to model time sequences.…”

Section: Preliminarymentioning

confidence: 99%

See 1 more Smart Citation

From Regional to Global Brain: A Novel Hierarchical Spatial-Temporal Neural Network Model for EEG Emotion Recognition

Yang

Zheng

Wang

et al. 2022

IEEE Trans. Affective Comput.

133

View full text Add to dashboard Cite

IEEE In this paper, we propose a novel Electroencephalograph (EEG) emotion recognition method inspired by neuroscience with respect to the brain response to different emotions. The proposed method, denoted by R2G-STNN, consists of spatial and temporal neural network models with regional to global hierarchical feature learning process to learn discriminative spatial-temporal EEG features. To learn the spatial features, a bidirectional long short term memory (BiLSTM) network is adopted to capture the intrinsic spatial relationships of EEG electrodes within brain region and between brain regions, respectively. Considering that different brain regions play different roles in the EEG emotion recognition, a regionattention layer into the R2G-STNN model is also introduced to learn a set of weights to strengthen or weaken the contributions of brain regions. Based on the spatial feature sequences, BiLSTM is adopted to learn both regional and global spatial-temporal features and the features are fitted into a classifier layer for learning emotion-discriminative features, in which a domain discriminator working corporately with the classifier is used to decrease the domain shift between training and testing data. Finally, to evaluate the proposed method, we conduct both subject-dependent and subject-independent EEG emotion recognition experiments on SEED database, and the experimental results show that the proposed method achieves state-of-the-art performance.Abstract-In this paper, we propose a novel Electroencephalograph (EEG) emotion recognition method inspired by neuroscience with respect to the brain response to different emotions. The proposed method, denoted by R2G-STNN, consists of spatial and temporal neural network models with regional to global hierarchical feature learning process to learn discriminative spatial-temporal EEG features. To learn the spatial features, a bidirectional long short term memory (BiLSTM) network is adopted to capture the intrinsic spatial relationships of EEG electrodes within brain region and between brain regions, respectively. Considering that different brain regions play different roles in the EEG emotion recognition, a region-attention layer into the R2G-STNN model is also introduced to learn a set of weights to strengthen or weaken the contributions of brain regions. Based on the spatial feature sequences, BiLSTM is adopted to learn both regional and global spatial-temporal features and the features are fitted into a classifier layer for learning emotiondiscriminative features, in which a domain discriminator working corporately with the classifier is used to decrease the domain shift between training and testing data. Finally, to evaluate the proposed method, we conduct both subject-dependent and subject-independent EEG emotion recognition experiments on SEED database, and the experimental results show that the proposed method achieves state-of-the-art performance.

show abstract

Section: Preliminarymentioning

confidence: 99%

Section: Preliminarymentioning

confidence: 99%

From Regional to Global Brain: A Novel Hierarchical Spatial-Temporal Neural Network Model for EEG Emotion Recognition

Yang

Zheng

Wang

et al. 2022

IEEE Trans. Affective Comput.

133

View full text Add to dashboard Cite

show abstract

“…This is further illustrated by the fact that for general questions, the speech-only model performed as well as the text-only model. We also note that recent work by Yu et al (2016) used neural networks to learn high-level abstractions from frame-to-frame acoustic properties of the signal and showed that these features provided a very limited gain over the features considered in this study.…”

Section: Discussionmentioning

confidence: 99%

Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks

Loukina

Madnani

Cahill

2017

Proceedings of the Workshop on Speech-Centric Natural Language Processing

View full text Add to dashboard Cite

We consider the automatic scoring of a task for which both the content of the response as well the pronunciation and fluency are important. We combine features from a text-only content scoring system originally designed for written responses with several categories of acoustic features. Although adding any single category of acoustic features to the text-only system on its own does not significantly improve performance, adding all acoustic features together does yield a small but significant improvement. These results are consistent for responses to openended questions and to questions focused on some given source material.

show abstract

“…For our initial work on investigating the use of automated speech rating in dialog systems, we relaxed the aforementioned constraint in favor of creating a real‐time‐able system. To this end, we implemented a hybrid recurrent neural network framework that comes with minimal manual effort and cost and high scoring accuracy and speed (Yu et al, ).…”

Section: Speech Scoringmentioning

confidence: 99%

“…For our initial work on investigating the use of automated speech rating in dialog systems, we relaxed the aforementioned constraint in favor of creating a real-time-able system. To this end, we implemented a hybrid recurrent neural network framework that comes with minimal manual effort and cost and high scoring accuracy and speed (Yu et al, 2015). In the proposed framework, we used generic time-sequence features extracted directly from the audio input instead of manually designed features, thus saving on human transcription effort and expert knowledge for training and optimizing the speech recognition engine for the rater.…”

Section: Speech Scoringmentioning

confidence: 99%

A Multimodal Dialog System for Language Assessment: Current State and Future Directions

Suendermann-Oeft

Ramanarayanan

et al. 2017

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

We present work in progress on a multimodal dialog system for English language assessment using a modular cloud‐based architecture adhering to open industry standards. Among the modules being developed for the system, multiple modules heavily exploit machine learning techniques, including speech recognition, spoken language proficiency rating, speaker recognition, and the scoring of behaviors in multimodal data streams.

show abstract

Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech

Cited by 69 publications

References 17 publications

From Regional to Global Brain: A Novel Hierarchical Spatial-Temporal Neural Network Model for EEG Emotion Recognition

From Regional to Global Brain: A Novel Hierarchical Spatial-Temporal Neural Network Model for EEG Emotion Recognition

Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks

A Multimodal Dialog System for Language Assessment: Current State and Future Directions

Contact Info

Product

Resources

About