EmbraceNet: A robust deep learning architecture for multimodal classification

Choi, Jun Ho; Lee, Jong Seok

doi:10.1016/j.inffus.2019.02.010

Cited by 116 publications

(98 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance then improves quickly (e.g. from 94.4% to 96.0% for {motion, sound}) for smoothing window size [15,40] seconds, and then slowly (e.g. from 96.0% to 96.8%) for smoothing window size [40,80] seconds.…”

Section: Post-processing Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Transportation mode recognition fusing wearable motion, sound and vision sensors

et al. 2020

View full text Add to dashboard Cite

We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose two schemes that fuse the classification results from the three mono-modal classifiers. The first scheme makes an ensemble decision with fixed rules including Sum, Product, Majority Voting, and Borda Count. The second scheme is an adaptive fuser built as another classifier (including Naive Bayes, Decision Tree, Random Forest and Neural Network) that learns enhanced predictions by combining the outputs from the three mono-modal classifiers. We verify the advantage of the proposed method with the state-of-the-art Sussex-Huawei Locomotion and Transportation (SHL) dataset recognizing the eight transportation activities: Still, Walk, Run, Bike, Bus, Car, Train and Subway. We achieve F1 scores of 79.4%, 82.1% and 72.8% with the mono-modal motion, sound and vision classifiers, respectively. The F1 score is remarkably improved to 94.5% and 95.5% by the two data fusion schemes, respectively. The recognition performance can be further improved with a post-processing scheme that exploits the temporal continuity of transportation. When assessing generalization of the model to unseen data, we show that while performance is reduced-as expected-for each individual classifier, the benefits of fusion are retained with performance improved by 15 percentage points. Besides the actual performance increase, this work, most importantly, opens up the possibility for dynamically fusing modalities to achieve distinct power-performance trade-off at run time.

show abstract

Section: Post-processing Resultsmentioning

confidence: 99%

“…Many machine learning approaches have been proposed to fuse multimodal information for classification tasks [31], [33], [39], [40]. These approaches can be categorized as early integration (data-layer fusion), late integration (decision-layer fusion).…”

Section: Introductionmentioning

confidence: 99%

Transportation mode recognition fusing wearable motion, sound and vision sensors

et al. 2020

View full text Add to dashboard Cite

show abstract

“…There are many recent examples of the use of autoencoders for such a purpose, i.e. in the field of robotics [27][28][29]. An advantage of multimodal autoencoders is that they can produce a vector of parameters based on the fusion of data originating from two or more different modalities.…”

Section: Methodsmentioning

confidence: 99%

Multifactor consciousness level assessment of participants with acquired brain injuries employing human–computer interfaces

et al. 2020

View full text Add to dashboard Cite

Background: A lack of communication with people suffering from acquired brain injuries may lead to drawing erroneous conclusions regarding the diagnosis or therapy of patients. Information technology and neuroscience make it possible to enhance the diagnostic and rehabilitation process of patients with traumatic brain injury or posthypoxia. In this paper, we present a new method for evaluation possibility of communication and the assessment of such patients' state employing future generation computers extended with advanced human-machine interfaces.Methods: First, the hearing abilities of 33 participants in the state of coma were evaluated using auditory brainstem response measurements (ABR). Next, a series of interactive computer-based exercise sessions were performed with the therapist's assistance. Participants' actions were monitored with an eye-gaze tracking (EGT) device and with an electroencephalogram EEG monitoring headset. The data gathered were processed with the use of data clustering techniques.Results: Analysis showed that the data gathered and the computer-based methods developed for their processing are suitable for evaluating the participants' responses to stimuli. Parameters obtained from EEG signals and eye-tracker data were correlated with Glasgow Coma Scale (GCS) scores and enabled separation between GCS-related classes. The results show that in the EEG and eye-tracker signals, there are specific consciousness-related states discoverable. We observe them as outliers in diagrams on the decision space generated by the autoencoder. For this reason, the numerical variable that separates particular groups of people with the same GCS is the variance of the distance of points from the cluster center that the autoencoder generates. The higher the GCS score, the greater the variance in most cases. The results proved to be statistically significant in this context. Conclusions:The results indicate that the method proposed may help to assess the consciousness state of participants in an objective manner.

show abstract

“…Feature extraction using DCNN models has achieved promising results in extracting high-level features for different classification tasks [23]- [25]. Since fine-tuning of wellestablished DCNN architectures has not previously achieved good performance on this dataset, for this study, we employ the DCNN descriptor approach [26]- [28] to extract features in order to represent the discriminative characteristics of different classes sufficiently.…”

Section: A Network Architecturementioning

confidence: 99%

Breast Cancer Diagnosis with Transfer Learning and Global Pooling

Kassani

Wesolowski

et al. 2019

2019 International Conference on Information and Communication Technology Convergence (ICTC)

View full text Add to dashboard Cite

Breast cancer is one of the most common causes of cancer-related death in women worldwide. Early and accurate diagnosis of breast cancer may significantly increase the survival rate of patients. In this study, we aim to develop a fully automatic, deep learning-based, method using descriptor features extracted by Deep Convolutional Neural Network (DCNN) models and pooling operation for the classification of hematoxylin and eosin stain (H&E) histological breast cancer images provided as a part of the International Conference on Image Analysis and Recognition (ICIAR) 2018 Grand Challenge on BreAst Cancer Histology (BACH) Images. Different data augmentation methods are applied to optimize the DCNN performance. We also investigated the efficacy of different stain normalization methods as a pre-processing step. The proposed network architecture using a pre-trained Xception model yields 92.50% average classification accuracy.

show abstract

EmbraceNet: A robust deep learning architecture for multimodal classification

Cited by 116 publications

References 48 publications

Transportation mode recognition fusing wearable motion, sound and vision sensors

Transportation mode recognition fusing wearable motion, sound and vision sensors

Multifactor consciousness level assessment of participants with acquired brain injuries employing human–computer interfaces

Breast Cancer Diagnosis with Transfer Learning and Global Pooling

Contact Info

Product

Resources

About