Shao-Yen Tseng scite author profile

Chakravarthula

Baucom

et al. 2016

Observational studies on couple interactions are often based on manual annotations of a set of behavior codes. Such annotations are expensive, time-consuming, and often suffer from low inter-annotator agreement. In previous studies it has been shown that the lexical channels contain sufficient information for capturing behavior and predicting the interaction labels, and various automated processes using language models have been proposed. However, current methods are restricted to a small context window due to the difficulty of training language models with limited data as well as the lack of frame-level labels. In this paper we investigate the application of recurrent neural networks for capturing behavior trajectories through larger context windows. We solve the issue of data sparsity and improve robustness by introducing out-of-domain knowledge through pretrained word representations. Finally, we show that our system can accurately estimate true rating values of couples interactions using a fusion of the frame-level behavior trajectories. The ratings predicted by our proposed system achieve inter-annotator agreements comparable to those of trained human annotators. Importantly, our system promises robust handling of out of domain data, exploitation of longer context, on-line feedback with continuous labels and easy fusion with other modalities.

Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

IEEE Signal Process. Lett.

Narayanan

Georgiou

2021

Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many natural language tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of multimodal inputs to a pretrained bidirectional language model. The language model is trained on spoken language that includes text and audio modalities. The resulting representations from this model are multimodal and contain paralinguistic information which can modify word meanings and provide affective information. We show that these multimodal embeddings can be used to improve over previous state of the art multimodal models in emotion recognition on the CMU-MOSEI dataset.

Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings

Baucom

Georgiou

2017

Identifying complex behavior in human interactions for observational studies often involves the tedious process of transcribing and annotating large amounts of data. While there is significant work towards accurate transcription in Automatic Speech Recognition, automatic Natural Language Understanding of high-level human behaviors from the transcribed text is still at an early stage of development. In this paper we present a novel approach for modeling human behavior using sentence embeddings and propose an automatic behavior annotation framework. We explore unsupervised methods of extracting semantic information, using seq2seq models, into deep sentence embeddings and demonstrate that these embeddings capture behaviorally meaningful information. Our proposed framework utilizes LSTM Recurrent Neural Networks to estimate behavior trajectories from these sentence embeddings. Finally, we employ fusion to compare our high-resolution behavioral trajectories with the coarse, session-level behavioral ratings of human annotators in Couples Therapy. Our experiments show that behavior annotation using this framework achieves better results than prior methods and approaches or exceeds human performance in terms of annotator agreement.

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

Wang

et al. 2018

State-of-the-art audio event detection (AED) systems rely on supervised learning using strongly labeled data. However, this dependence severely limits scalability to large-scale datasets where fine resolution annotations are too expensive to obtain. In this paper, we propose a small-footprint multiple instance learning (MIL) framework for multi-class AED using weakly annotated labels. The proposed MIL framework uses audio embeddings extracted from a pre-trained convolutional neural network as input features. We show that by using audio embeddings the MIL framework can be implemented using a simple DNN with performance comparable to recurrent neural networks.We evaluate our approach by training an audio tagging system using a subset of AudioSet, which is a large collection of weakly labeled YouTube video excerpts. Combined with a latefusion approach, we improve the F1 score of a baseline audio tagging system by 17%. We show that audio embeddings extracted by the convolutional neural networks significantly boost the performance of all MIL models. This framework reduces the model complexity of the AED system and is suitable for applications where computational resources are limited.

Unsupervised online multitask learning of behavioral sentence embeddings

Baucom

Georgiou

2019

Unsupervised learning has been an attractive method for easily deriving meaningful data representations from vast amounts of unlabeled data. These representations, or embeddings, often yield superior results in many tasks, whether used directly or as features in subsequent training stages. However, the quality of the embeddings is highly dependent on the assumed knowledge in the unlabeled data and how the system extracts information without supervision. Domain portability is also very limited in unsupervised learning, often requiring re-training on other in-domain corpora to achieve robustness. In this work we present a multitask paradigm for unsupervised contextual learning of behavioral interactions which addresses unsupervised domain adaption. We introduce an online multitask objective into unsupervised learning and show that sentence embeddings generated through this process increases performance of affective tasks.