Kehua Lei scite author profile

Kehua Lei

5Publications

11Citation Statements Received

54Citation Statements Given

How they've been cited

How they cite others

Affiliations

Tsinghua University

Publications

Order By: Most citations

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach

Zhou

Jia

Wang

et al. 2018

AAAI

View full text Add to dashboard Cite

To give a more humanized response in Voice Dialogue Applications (VDAs), inferring emotion states from users’ queries may play an important role. However, in VDAs, we have tremendous amount of VDA users and massive scale of unlabeled data with high dimension features from multimodal information, which challenge the traditional speech emotion recognition methods. In this paper, to better infer emotion from conversational voice data, we proposed a semi-supervised multi-path generative neural network. Specifically, first, we build a novel supervised multi-path deep neural network framework. To avoid high dimensional input, raw features are trained by groups in local classifiers. Then high-level features of each local classifiers are concatenated as input of a global classifier. These two kinds classifiers are trained simultaneously through a single objective function to achieve a more effective and discriminative emotion inferring. To further solve the labeled-data-scarcity problem, we extend the multi-path deep neural network to a generative model based on semi-supervised variational autoencoder (semi-VAE), which is able to train the labeled and unlabeled data simultaneously. Experiment based on a 24,000 real-world dataset collected from Sogou Voice Assistant (SVAD13) and a benchmark dataset IEMOCAP show that our method significantly outperforms the existing state-of-the-art results.

show abstract

Design and Implementation of a Disambiguity Framework for Smart Voice Controlled Devices

Lei

Jia

et al. 2019

View full text Add to dashboard Cite

With about 100 million people using it recently, SVCD(Smart Voice Controlled Device) are becoming demotic. Whether at home or in an office, usually, multiple appliances are under the control of a single SVCD and several people may manipulate an SVCD simultaneously. However, present SVCD fails to handle them appropriately. In this paper, we propose a novel framework for SVCD to eliminate orders’ ambiguity for single user or multi-user. We also design an algorithm combining Word2Vec and emotion detection for the device to wipe off ambiguity. Finally, we apply our framework into a virtual smart home scene and the performance of it indicates that our strategy resolves the problems commendably.

show abstract

AI Painting

Zhang

Lei

Jia

et al. 2018

View full text Add to dashboard Cite

TwistBlocks

Wang

Lei

et al. 2018

View full text Add to dashboard Cite

Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional Modeling

Zhou

Jia

Yin

et al. 2019

View full text Add to dashboard Cite

Teaching style plays an influential role in helping students to achieve academic success. In this paper, we explore a new problem of effectively understanding teachers' teaching styles. Specifically, we study 1) how to quantitatively characterize various teachers' teaching styles for various teachers and 2) how to model the subtle relationship between cross-media teaching related data (speech, facial expressions and body motions, content et al.) and teaching styles. Using the adjectives selected from more than 10,000 feedback questionnaires provided by an educational enterprise, a novel concept called Teaching Style Semantic Space (TSSS) is developed based on the pleasure-arousal dimensional theory to describe teaching styles quantitatively and comprehensively. Then a multi-task deep learning based model, Attention-based Multi-path Multi-task Deep Neural Network (AMMDNN), is proposed to accurately and robustly capture the internal correlations between cross-media features and TSSS. Based on the benchmark dataset, we further develop a comprehensive data set including 4,541 full-annotated cross-modality teaching classes. Our experimental results demonstrate that the proposed AMMDNN outperforms (+0.0842 in terms of the concordance correlation coefficient (CCC) on average) baseline methods. To further demonstrate the advantages of the proposed TSSS and our model, several interesting case studies are carried out, such as * corresponding author. teaching styles comparison among different teachers and courses, and leveraging the proposed method for teaching quality analysis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kehua Lei

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach

Design and Implementation of a Disambiguity Framework for Smart Voice Controlled Devices

AI Painting

TwistBlocks

Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional Modeling

Contact Info

Product

Resources

About