Baijun Xie scite author profile

Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD.

show abstract

Musical Emotion Recognition with Spectral Feature Extraction Based on a Sinusoidal Model with Model-Based and Deep-Learning Approaches

Xie

Kim

Park

2020

Applied Sciences

View full text Add to dashboard Cite

This paper presents a method for extracting novel spectral features based on a sinusoidal model. The method is focused on characterizing the spectral shapes of audio signals using spectral peaks in frequency sub-bands. The extracted features are evaluated for predicting the levels of emotional dimensions, namely arousal and valence. Principal component regression, partial least squares regression, and deep convolutional neural network (CNN) models are used as prediction models for the levels of the emotional dimensions. The experimental results indicate that the proposed features include additional spectral information that common baseline features may not include. Since the quality of audio signals, especially timbre, plays a major role in affecting the perception of emotional valence in music, the inclusion of the presented features will contribute to decreasing the prediction error rate.

show abstract

Empathetic Robot With Transformer-Based Dialogue Agent

Xie

Park

2021

View full text Add to dashboard Cite

Dance with a Robot

Xie

Park

2020

View full text Add to dashboard Cite

This late-breaking report presents a method for learning sequential and temporal mapping between music and dance via the Sequenceto-Sequence (Seq2Seq) architecture. In this study, the Seq2Seq model comprises two parts: the encoder for processing the music inputs and the decoder for generating the output motion vectors. This model has the ability to accept music features and motion inputs from the user for human-robot interactive learning sessions, which outputs the motion patterns that teach the corrective movements to follow the moves from the expert dancer. Three different types of Seq2Seq models are compared in the results and applied to a simulation platform. This model will be applied in social interaction scenarios with children with autism spectrum disorder (ASD). CCS CONCEPTS• Computer systems organization → Robotics; • Computing methodologies → Neural networks; • Human-centered computing → Collaborative interaction.

show abstract

Trainable Quaternion Extended Kalman Filter with Multi-Head Attention for Dead Reckoning in Autonomous Ground Vehicles

Milam

Xie²,

Liu³

et al. 2022

Sensors

View full text Add to dashboard Cite

Extended Kalman filter (EKF) is one of the most widely used Bayesian estimation methods in the optimal control area. Recent works on mobile robot control and transportation systems have applied various EKF methods, especially for localization. However, it is difficult to obtain adequate and reliable process-noise and measurement-noise models due to the complex and dynamic surrounding environments and sensor uncertainty. Generally, the default noise values of the sensors are provided by the manufacturer, but the values may frequently change depending on the environment. Thus, this paper mainly focuses on designing a highly accurate trainable EKF-based localization framework using inertial measurement units (IMUs) for the autonomous ground vehicle (AGV) with dead reckoning, with the goal of fusing it with a laser imaging, detection, and ranging (LiDAR) sensor-based simultaneous localization and mapping (SLAM) estimation for enhancing the performance. Convolution neural networks (CNNs), backward propagation algorithms, and gradient descent methods are implemented in the system to optimize the parameters in our framework. Furthermore, we develop a unique cost function for training the models to improve EKF accuracy. The proposed work is general and applicable to diverse IMU-aided robot localization models.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Baijun Xie

Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion

Musical Emotion Recognition with Spectral Feature Extraction Based on a Sinusoidal Model with Model-Based and Deep-Learning Approaches

Empathetic Robot With Transformer-Based Dialogue Agent

Dance with a Robot

Trainable Quaternion Extended Kalman Filter with Multi-Head Attention for Dead Reckoning in Autonomous Ground Vehicles

Contact Info

Product

Resources

About