Mahdi Abavisani scite author profile

We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities in individual networks so that each unimodal network can achieve an improved performance. In particular, we dedicate separate networks per available modality and enforce them to collaborate and learn to develop networks with common semantics and better representations. We introduce a "spatiotemporal semantic alignment" loss (SSA) to align the content of the features from different networks. In addition, we regularize this loss with our proposed "focal regularization parameter" to avoid negative knowledge transfer. Experimental results show that our framework improves the test time recognition accuracy of unimodal networks, and provides the state-of-the-art performance on various dynamic hand gesture recognition datasets.

show abstract

Multimodal Categorization of Crisis Events in Social Media

Abavisani¹,

Hu³

et al. 2020

View full text Add to dashboard Cite

Multimodal sparse and low-rank subspace clustering

Abavisani

Patel

2018

Information Fusion

View full text Add to dashboard Cite

In2I: Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks

Perera

Abavisani

Patel

2018

View full text Add to dashboard Cite

In unsupervised image-to-image translation, the goal is to learn the mapping between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-toimage translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a specified domain. For this purpose, we introduce a Generative Adversarial Network (GAN) based framework along with a multi-modal generator structure and a new loss term, latent consistency loss. Through various experiments we show that leveraging multiple inputs generally improves the visual quality of the translated images. Moreover, we show that the proposed method outperforms current state-of-the-art unsupervised image-to-image translation methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mahdi Abavisani

Deep Multimodal Subspace Clustering Networks

Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training

Multimodal Categorization of Crisis Events in Social Media

Multimodal sparse and low-rank subspace clustering

In2I: Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks

Contact Info

Product

Resources

About