Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Sleeman, William C.; Kapoor, Rishabh; Ghosh, Preetam

doi:10.48550/arxiv.2109.09020

Cited by 1 publication

(3 citation statements)

References 87 publications

(141 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multi-View (MV) classification seeks to combine characteristics from different sources (here the image types). The accuracy of object identification increases due to the diversity of the features extracted from different sources and that are fused [17], [21]. Thus, the performance of a DL-model can be improved by optimizing multiple functions, one per each image type.…”

Section: B Proposed Approachmentioning

confidence: 99%

“…Finally, the baseline model and the modified version with attention were used to train the MV fusion models without, and with attention, respectively. To fuse the features, two late-fusion strategies are explored: feature concatenation, and max-pooling of the individual features obtained by each view response [16], [17].…”

Section: B Proposed Approachmentioning

confidence: 99%

“…This contribution takes inspiration from recent works in multi-view fusion strategies [15]- [17], which seek to combine characteristics from different sources or modalities to further improve machine learning based classification models. The aim of combining/fusing the features extracted from surface and section images is to increase the amount of discriminant information to improve the accuracy of the classification.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improved Kidney Stone Recognition Through Attention and Multi-View Feature Fusion Strategies

Villalvazo-Avila¹,

Lopez-Tiro²,

Jonathan³

et al. 2022

Preprint

View full text Add to dashboard Cite

This contribution presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This approach is specifically designed to mimic the morpho-constitutional analysis performed in ex-vivo by biologists to visually identify kidney stones by inspecting both views. The addition of attention mechanisms to the backbone improved the results of single view extraction backbones by 4% on average. Moreover, in comparison to the state-of-the-art, the fusion of the deep features improved the overall results up to 11% in terms of kidney stone classification accuracy.

show abstract

Section: B Proposed Approachmentioning

confidence: 99%

Section: B Proposed Approachmentioning

confidence: 99%