Thresholds of impaired cerebral hemodynamics that predict short-term cognitive decline in asymptomatic carotid stenosis

Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the recognition performance strongly depends on the type of speech data independent of the choice of input features. Furthermore, we achieved state-of-the-art results on the improvised speech data of IEMOCAP.

show abstract

Generating exact lattices in the WFST framework

Povey

et al. 2012

View full text Add to dashboard Cite

We describe a lattice generation method that is exact, i.e. it satisfies all the natural properties we would want from a lattice of alternative transcriptions of an utterance. This method does not introduce substantial overhead above one-best decoding. Our method is most directly applicable when using WFST decoders where the WFST is "fully expanded", i.e. where the arcs correspond to HMM transitions. It outputs lattices that include HMM-state-level alignments as well as word labels. The general idea is to create a state-level lattice during decoding, and to do a special form of determinization that retains only the best-scoring path for each word sequence. This special determinization algorithm is a solution to the following problem: Given a WFST A, compute a WFST B that, for each input-symbolsequence of A, contains just the lowest-cost path through A.

show abstract

Combining Recurrent and Convolutional Neural Networks for Relation Classification

Vu¹,

Adel²,

Gupta³

et al. 2016

137

View full text Add to dashboard Cite

This paper investigates two different neural architectures for the task of relation classification: convolutional neural networks and recurrent neural networks. For both models, we demonstrate the effect of different architectural choices. We present a new context representation for convolutional neural networks for relation classification (extended middle context). Furthermore, we propose connectionist bi-directional recurrent neural networks and introduce ranking loss for their optimization. Finally, we show that combining convolutional and recurrent neural networks using a simple voting scheme is accurate enough to improve results. Our neural models achieve state-of-the-art results on the SemEval 2010 relation classification task.

show abstract

A first speech recognition system for Mandarin-English code-switch conversational speech

Lyu

Weiner

et al. 2012

116

View full text Add to dashboard Cite

Multilingual deep neural network based acoustic modeling for rapid language adaptation

Imseng

Povey

et al. 2014

111

View full text Add to dashboard Cite

This paper presents a study on multilingual deep neural network (DNN) based acoustic modeling and its application to new languages. We investigate the effect of phone merging on multilingual DNN in context of rapid language adaptation. Moreover, the combination of multilingual DNNs with Kullback-Leibler divergence based acoustic modeling (KL-HMM) is explored.Using ten different languages from the Globalphone database, our studies reveal that crosslingual acoustic model transfer through multilingual DNNs is superior to unsupervised RBM pre-training and greedy layer-wise supervised training. We also found that KL-HMM based decoding consistently outperforms conventional hybrid decoding, especially in low-resource scenarios. Furthermore, the experiments indicate that multilingual DNN training equally benefits from simple phoneset concatenation and manually derived universal phonesets.

show abstract

Improving Speech Emotion Recognition with Unsupervised Representation Learning on Unlabeled Speech

Neumann

2019

View full text Add to dashboard Cite

GlobalPhone: A multilingual text & speech database in 20 languages

Schultz

Schlippe

2013

View full text Add to dashboard Cite

Neural-based Context Representation Learning for Dialog Act Classification

Ortega

2017

View full text Add to dashboard Cite

We explore context representation learning methods in neural-based models for dialog act classification. We propose and compare extensively different methods which combine recurrent neural network architectures and attention mechanisms (AMs) at different context levels. Our experimental results on two benchmark datasets show consistent improvements compared to the models without contextual information and reveal that the most suitable AM in the architecture depends on the nature of the dataset.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ngoc Thang Vu

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Generating exact lattices in the WFST framework

Combining Recurrent and Convolutional Neural Networks for Relation Classification

A first speech recognition system for Mandarin-English code-switch conversational speech

Multilingual deep neural network based acoustic modeling for rapid language adaptation

Improving Speech Emotion Recognition with Unsupervised Representation Learning on Unlabeled Speech

GlobalPhone: A multilingual text & speech database in 20 languages

Neural-based Context Representation Learning for Dialog Act Classification

Contact Info

Product

Resources

About

Ngoc Thang Vu

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Generating exact lattices in the WFST framework

Combining Recurrent and Convolutional Neural Networks for Relation Classification

A first speech recognition system for Mandarin-English code-switch conversational speech

Multilingual deep neural network based acoustic modeling for rapid language adaptation

Improving Speech Emotion Recognition with Unsupervised Representation Learning on Unlabeled Speech

GlobalPhone: A multilingual text &amp; speech database in 20 languages

Neural-based Context Representation Learning for Dialog Act Classification

Contact Info

Product

Resources

About

GlobalPhone: A multilingual text & speech database in 20 languages