The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and textual content. Extracting relevant patterns from this kind of data is still a motivating goal for researchers in deep learning. In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the integration and combination of heterogeneous visual cues across sensory modalities. In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains. Finally, we highlight the limitations and challenges of deep multimodal learning and provide insights and directions for future research.
The novel Coronavirus disease (COVID-19), which first appeared at the end of December 2019, continues to spread rapidly in most countries of the world. Respiratory infections occur primarily in the majority of patients treated with COVID-19. In light of the growing number of COVID-19 cases, the need for diagnostic tools to identify COVID-19 infection at early stages is of vital importance. For decades, chest X-ray (CXR) technologies have proven their ability to accurately detect respiratory diseases. More recently, with the availability of COVID-19 CXR scans, deep learning algorithms have played a critical role in the healthcare arena by allowing radiologists to recognize COVID-19 patients from their CXR images. However, the majority of screening methods for COVID-19 reported in recent studies are based on 2D convolutional neural networks (CNNs). Although 3D CNNs are capable of capturing contextual information compared to their 2D counterparts, their use is limited due to their increased computational cost (i.e. requires much extra memory and much more computing power). In this study, a transfer learning-based hybrid 2D/3D CNN architecture for COVID-19 screening using CXRs has been developed. The proposed architecture consists of the incorporation of a pre-trained deep model (VGG16) and a shallow 3D CNN, combined with a depth-wise separable convolution layer and a spatial pyramid pooling module (SPP). Specifically, the depth-wise separable convolution helps to preserve the useful features while reducing the computational burden of the model. The SPP module is designed to extract multi-level representations from intermediate ones. Experimental results show that the proposed framework can achieve reasonable performances when evaluated on a collected dataset (3 classes to be predicted: COVID-19, Pneumonia, and Normal). Notably, it achieved a sensitivity of 98.33%, a specificity of 98.68% and an overall accuracy of 96.91%
So far, COVID-19, the novel coronavirus, continues to spread rapidly in most countries of the world, putting people's lives at risk. According to the WHO, respiratory infections occur primarily in the majority of patients treated with COVID-19. For decades, chest X-ray (CXR) technologies have proven their ability to accurately detect and treat respiratory diseases. Deep learning techniques, as well as the availability of a large number of CXR samples, have made a significant contribution to the fight against this pandemic. However, the most common screening methods are based on 2D CNNs, since 3D counterparts are enormously costly and labor-intensive. In this study, a hybrid 2D/3D convolutional neural network (CNN) architecture for COVID-19 screening using CXRs has been developed. The proposed architecture consists of the incorporation of a pre-trained deep model (VGG-16) and a shallow 3D CNN, combined with a depth-wise separable convolution layer and a spatial pyramid pooling module (SPP). Specifically, the depth-wise separable convolution helps to preserve the useful features while reducing the computational burden of the model. The SPP module is designed to extract multi-level representations from intermediate ones. Experimental results show that the proposed framework can achieve reasonable performances when evaluated on a collected dataset (3 classes: COVID-19, Pneumonia, and Normal). Notably, it achieved a sensitivity of 98.33%, a specificity of 98.68% and an overall accuracy of 96.91%
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.