Abstract-Emotion recognition is a crucial problem in Human-Computer Interaction (HCI).Various techniques were applied to enhance the robustness of the emotion recognition systems using electroencephalogram (EEG) signals especially the problem of spatiotemporal features learning. In this paper, a novel EEG-based emotion recognition approach is proposed. In this approach, the use of the 3-Dimensional Convolutional Neural Networks (3D-CNN) is investigated using a multi-channel EEG data for emotion recognition. A data augmentation phase is developed to enhance the performance of the proposed 3D-CNN approach. And, a 3D data representation is formulated from the multi-channel EEG signals, which is used as data input for the proposed 3D-CNN model. Extensive experimental works are conducted using the DEAP (Dataset of Emotion Analysis using the EEG and Physiological and Video Signals) data. It is found that the proposed method is able to achieve recognition accuracies 87.44% and 88.49% for valence and arousal classes respectively, which is outperforming the state of the art methods.
Speech recognition of disorder people is a difficult task due to the lack of motor-control of the speech articulators. Multimodal speech recognition can be used to enhance the robustness of disordered speech. This paper introduces an automatic speech recognition system for people with dysarthria speech disorder based on both speech and visual components. The Mel-Frequency Cepestral Coefficients (MFCC) is used as features representing the acoustic speech signal. For the visual counterpart, the Discrete Cosine Transform (DCT) Coefficients are extracted from the speaker's mouth region. Face and mouth regions are detected using the Viola-Jones algorithm. The acoustic and visual input features are then concatenated on one feature vector. Then, the Hidden Markov Model (HMM) classifier is applied on the combined feature vector of acoustic and visual components. The system is tested on isolated English words spoken by disorder speakers from UA-Speech data. Results of the proposed system indicate that visual features are highly effective and can improve the accuracy to reach 7.91% for speaker dependent experiments and 3% for speaker independent experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.