2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00713
|View full text |Cite
|
Sign up to set email alerts
|

MFAS: Multimodal Fusion Architecture Search

Abstract: We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
123
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 151 publications
(123 citation statements)
references
References 37 publications
0
123
0
Order By: Relevance
“…• For weighted sum with scalar weights, an iterative method is proposed [125] that requires the pre-trained vector representations for each modality to have the same number of elements arranged in an order that is suitable for element-wise addition. This is often achieved by jointly training a fully connected layer for dimension control and reordering for each modality, together with the scalar weights for fusion.…”
Section: A Simple Operation-based Fusionmentioning
confidence: 99%
“…• For weighted sum with scalar weights, an iterative method is proposed [125] that requires the pre-trained vector representations for each modality to have the same number of elements arranged in an order that is suitable for element-wise addition. This is often achieved by jointly training a fully connected layer for dimension control and reordering for each modality, together with the scalar weights for fusion.…”
Section: A Simple Operation-based Fusionmentioning
confidence: 99%
“…Recently, one-shot NAS methods have been proposed to eliminate the meta-controller by modeling the NAS problem as a single training process of an over-parameterized supernet that comprises all candidate paths [5,7,32,52]. The most closely related study to our work is the MFAS approach [39], which also incorporates NAS to search the optimal architecture for multimodal tasks. However, MFAS focuses on a simpler problem to search for the multimodal fusion model given two input features, which cannot be directly used to address the multimodal learning tasks in this paper.…”
Section: Related Workmentioning
confidence: 99%
“…Vielzeuf et al proposed CentralNet, which converges different modality features step by step by using several levels of interim features available in each modality network [35]. There was also an attempt to use reinforcement learning-based AutoML to find the optimal fusion network architecture [38]. AutoML is effective in finding the optimal combination of hyper-parameters from each network layer and the layer from which the features of each modality are extracted.…”
Section: Multimodal Deep Learningmentioning
confidence: 99%