J. Zico Kolter scite author profile

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise crossmodal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

show abstract

Learning to detect malicious executables in the wild

Kolter

Maloof

2004

534

595

View full text Add to dashboard Cite

Towards fully autonomous driving: Systems and algorithms

Levinson¹,

et al. 2011

View full text Add to dashboard Cite

Using additive expert ensembles to cope with concept drift

2005

View full text Add to dashboard Cite

Near-Bayesian exploration in polynomial time

Kolter

2009

137

154

View full text Add to dashboard Cite

A control architecture for quadruped locomotion over rough terrain

2008

View full text Add to dashboard Cite

Regularization and feature selection in least-squares temporal difference learning

Kolter

2009

115

116

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

J. Zico Kolter

Dynamic weighted majority: a new ensemble method for tracking concept drift

Multimodal Transformer for Unaligned Multimodal Language Sequences

Learning to detect malicious executables in the wild

Towards fully autonomous driving: Systems and algorithms

Using additive expert ensembles to cope with concept drift

Near-Bayesian exploration in polynomial time

A control architecture for quadruped locomotion over rough terrain

Regularization and feature selection in least-squares temporal difference learning

Contact Info

Product

Resources

About