2019
DOI: 10.1007/978-3-030-11024-6_44
|View full text |Cite
|
Sign up to set email alerts
|

CentralNet: A Multilayer Approach for Multimodal Fusion

Abstract: This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, all… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
69
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 83 publications
(74 citation statements)
references
References 21 publications
1
69
0
Order By: Relevance
“…Method Modalities F1-W F1-M Unimodal baselines for fusion Maxout MLP [12] text 0.5754 0.4598 VGG Transfer image 0.4921 0.3350 Explicit fusion Two-stream [39] image + text 0.6081 0.5049 GMU [3] image + text 0.6170 0.5410 CentralNet [42] image + text 0.6223 0.5344 Ours Top 1 image + text 0.6250 0.5568 served that all multimodal fusion networks largely improve over the unimodal networks, but our automatically found fusion architecture is the one with the best overall score. This was found after three iterations of progressive search and L = 4.…”
Section: Methodsmentioning
confidence: 99%
“…Method Modalities F1-W F1-M Unimodal baselines for fusion Maxout MLP [12] text 0.5754 0.4598 VGG Transfer image 0.4921 0.3350 Explicit fusion Two-stream [39] image + text 0.6081 0.5049 GMU [3] image + text 0.6170 0.5410 CentralNet [42] image + text 0.6223 0.5344 Ours Top 1 image + text 0.6250 0.5568 served that all multimodal fusion networks largely improve over the unimodal networks, but our automatically found fusion architecture is the one with the best overall score. This was found after three iterations of progressive search and L = 4.…”
Section: Methodsmentioning
confidence: 99%
“…Vectorized features from different sources of knowledge can be combined in deep learning using a simple process, such as concatenation or weighted sum, which often has just a few or even number of parameters involved because, the joint training of the deep models will change the layers for high-level extractions of features to compensate for the process needed. Concatenation may be used to combine either low input [6], [7] characteristics or high feature derived from the pre-trained models [8], [9]. Proposed model uses the first technique that is simple operation-based fusion where the vectorized features from images are integrated using concatenation.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, before the fusion of the two feature vectors, the output from ResNet50 has to be resized to the output shape of VGG16. The two feature vectors which are having same shapes are fused together either by addition or by multiplication as defined in equation (8).…”
Section: B Proposed Approachmentioning
confidence: 99%
“…CentralNet Architecture The architecture of CentralNet [20] is a neural network which combines the features issued from different modalities, by taking as input of each one of its layers, a weighted sum of the layers of the corresponding unimodal networks and of its own previous layers. This is illustrated in Figure 1(c).…”
Section: Centralnetmentioning
confidence: 99%
“…The approach is multi-objective in the sense that it simultaneously tries to minimize per modality losses as well as the global loss defined on the joint space. This article is an extension of [20].…”
Section: Introduction and Related Workmentioning
confidence: 99%