Learning Fused Representations for Large-Scale Multimodal Classification

Nawaz, Shah; Calefati, Alessandro; Janjua, Muhammad Kamran; Anwaar, Muhammad Umer; Gallo, Ignazio

doi:10.1109/lsens.2018.2880790

Cited by 11 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We trained a custom CNN on the image-like embeddings and performed a grid search to tune its hyperparameters. The approach of encoding and stacking multimodal features into a single source suitable for training CNNs was inspired by Nawaz et al who fused image and text embeddings to improve classification performance [50].…”

Section: Treatment Prediction Using Deep Neural Networkmentioning

confidence: 99%

A multimodal dataset for precision oncology in head and neck cancer

Dörrich,

Balk,

Heusinger

et al. 2024

Preprint

View full text Add to dashboard Cite

Head and neck cancer is a common disease and is associated with a poor prognosis. A promising approach to improving patient outcomes is personalized treatment, which uses information from a variety of modalities. However, only little progress has been made due to the lack of large public datasets. We present a multimodal dataset, HANCOCK, that comprises monocentric, real-world data of 763 head and neck cancer patients. Our dataset contains demographical, pathological, and blood data as well as surgery reports and histologic images. We show its potential clinical impact in a multimodal machine-learning setting by proposing adjuvant treatment for previously unidentified risk patients. We found that especially the multimodal model outperformed single-modality models (area under the curve (AUC): 0.85). We believe that HANCOCK will not only open new insights into head and neck cancer pathology but also serve as a major source for researching multimodal machine-learning methodologies in precision oncology.

show abstract

Section: Treatment Prediction Using Deep Neural Networkmentioning

confidence: 99%

A multimodal dataset for precision oncology in head and neck cancer

Dörrich,

Balk,

Heusinger

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Several methods have been proposed to fuse image and textual features. Encoded textual features to the image domain and passing the resulting image through a CNN have been shown to improve classification performance [63][64][65]. Furthermore, transformer-based models have been successfully combined for multimodal classification [66].…”

Section: Multimodal Classificationmentioning

confidence: 99%

Multimodal Fine-Grained Grocery Product Recognition using Image and OCR Text

Pettersson,

Riveiro,

Löfström

2023

Preprint

View full text Add to dashboard Cite

Automatic recognition of grocery products can be used to improve customer flow at checkouts and reduce labor costs and store losses. Product recognition is, however, a challenging task for machine learning-based solutions due to the large number of products and their variations in appearance. In this work, we tackle the challenge of fine-grained product recognition by first extracting a large dataset from a grocery store containing products that are only differentiable by subtle details. Then, we propose a multimodal product recognition approach that uses product images with extracted OCR text from packages to improve fine-grained recognition of grocery products. We evaluate several image and text models separately and then combine them using different multimodal models of varying complexities. The results show that image and textual information complement each other in multimodal models and enable a classifier with greater recognition performance than unimodal models, especially when the number of training samples is limited. Therefore, this approach is suitable for many different scenarios in which product recognition is used to further improve recognition performance.

show abstract

“…Clustering means that the samples with large similarity in data are gathered into a category in terms of similarity criterion and express as a local area in feature space [43], [44]. It can discover and mine the arbitrary shape class clusters in remote sensing data and has great potential for classification of remote sensing data [45]- [47].…”

Section: Introductionmentioning

confidence: 99%

Clustering of Remote Sensing Data Based on K-Nearest Neighbors Sampling With Non-Evenly Division

Liu

Sun

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Data computation and traffic is the key step to rapid analysis and intelligent transportation application based on remote sensing data. To tackle the low computing efficiency and high storage cost in the analysis of remote sensing and improve the computational performance quickly, this paper proposed a new processing method of remote sensing data based on k-nearest neighbors (KNN) sampling with nonevenly division. In the method, we first sort and preprocess the original dataset in terms of any size of one-dimension and segment the sample dataset by non-evenly division. Then the samples with the range of boundary width are reserved, and a new local unsampled mapping table is reconstructed. Next, we traverse the subset and compute the distance matrix by Euclidean distance and the local density with descending order, and further determine whether the sample belongs to boundary sample in accordance with distance matrix and local density. We then construct the sampling dataset and combine again and achieve the processing result via adding the entire unsampled mapping table to the sample dataset. Finally, the current study is tested and verified by the simulation data and true traffic jam prediction case. Our experiments present that the proposed method not only can record precisely the correspondence relations between samples and unsampled data by the KNN sampling with non-evenly division and ensure the accuracy of clustering results, but also significantly reduce the data traffic and effectively improve the memory utilization. The result reveals that the proposed method can potentially contribute to the data analysis of remote sensing data and prediction of traffic jam with large scale and high real-time performance.

show abstract

Learning Fused Representations for Large-Scale Multimodal Classification

Cited by 11 publications

References 9 publications

A multimodal dataset for precision oncology in head and neck cancer

A multimodal dataset for precision oncology in head and neck cancer

Multimodal Fine-Grained Grocery Product Recognition using Image and OCR Text

Clustering of Remote Sensing Data Based on K-Nearest Neighbors Sampling With Non-Evenly Division

Contact Info

Product

Resources

About