2021
DOI: 10.48550/arxiv.2109.09020
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Abstract: Multimodal classification research has been gaining popularity in many domains that collect more data from multiple sources including satellite imagery, biometrics, and medicine. However, the lack of consistent terminology and architectural descriptions makes it difficult to compare different existing solutions. We address these challenges by proposing a new taxonomy for describing such systems based on trends found in recent publications on multimodal classification. Many of the most difficult aspects of unim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 87 publications
(141 reference statements)
0
3
0
Order By: Relevance
“…Multi-View (MV) classification seeks to combine characteristics from different sources (here the image types). The accuracy of object identification increases due to the diversity of the features extracted from different sources and that are fused [17], [21]. Thus, the performance of a DL-model can be improved by optimizing multiple functions, one per each image type.…”
Section: B Proposed Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…Multi-View (MV) classification seeks to combine characteristics from different sources (here the image types). The accuracy of object identification increases due to the diversity of the features extracted from different sources and that are fused [17], [21]. Thus, the performance of a DL-model can be improved by optimizing multiple functions, one per each image type.…”
Section: B Proposed Approachmentioning
confidence: 99%
“…Finally, the baseline model and the modified version with attention were used to train the MV fusion models without, and with attention, respectively. To fuse the features, two late-fusion strategies are explored: feature concatenation, and max-pooling of the individual features obtained by each view response [16], [17].…”
Section: B Proposed Approachmentioning
confidence: 99%
See 1 more Smart Citation