The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and textual content. Extracting relevant patterns from this kind of data is still a motivating goal for researchers in deep learning. In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the integration and combination of heterogeneous visual cues across sensory modalities. In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains. Finally, we highlight the limitations and challenges of deep multimodal learning and provide insights and directions for future research.
The novel Coronavirus disease (COVID-19), which first appeared at the end of December 2019, continues to spread rapidly in most countries of the world. Respiratory infections occur primarily in the majority of patients treated with COVID-19. In light of the growing number of COVID-19 cases, the need for diagnostic tools to identify COVID-19 infection at early stages is of vital importance. For decades, chest X-ray (CXR) technologies have proven their ability to accurately detect respiratory diseases. More recently, with the availability of COVID-19 CXR scans, deep learning algorithms have played a critical role in the healthcare arena by allowing radiologists to recognize COVID-19 patients from their CXR images. However, the majority of screening methods for COVID-19 reported in recent studies are based on 2D convolutional neural networks (CNNs). Although 3D CNNs are capable of capturing contextual information compared to their 2D counterparts, their use is limited due to their increased computational cost (i.e. requires much extra memory and much more computing power). In this study, a transfer learning-based hybrid 2D/3D CNN architecture for COVID-19 screening using CXRs has been developed. The proposed architecture consists of the incorporation of a pre-trained deep model (VGG16) and a shallow 3D CNN, combined with a depth-wise separable convolution layer and a spatial pyramid pooling module (SPP). Specifically, the depth-wise separable convolution helps to preserve the useful features while reducing the computational burden of the model. The SPP module is designed to extract multi-level representations from intermediate ones. Experimental results show that the proposed framework can achieve reasonable performances when evaluated on a collected dataset (3 classes to be predicted: COVID-19, Pneumonia, and Normal). Notably, it achieved a sensitivity of 98.33%, a specificity of 98.68% and an overall accuracy of 96.91%
ABSTRACT:The partitioning of an image into several constituent components is called image segmentation. Many approaches have been developed; one of them is the particle swarm optimization (PSO) algorithm, which is widely used. PSO algorithm is one of the most recent stochastic optimization strategies. In this article, a new efficient technique for the magnetic resonance imaging (MRI) brain images segmentation thematic based on PSO is proposed. The proposed algorithm presents an improved variant of PSO, which is particularly designed for optimal segmentation and it is called modified particle swarm optimization. The fitness function is used to evaluate all the particle swarm in order to arrange them in a descending order. The algorithm is evaluated by performance measures such as run time execution and the quality of the image after segmentation. The performance of the segmentation process is demonstrated by using a defined set of benchmark images and compared against conventional PSO, genetic algorithm, and PSO with Mahalanobis distance based segmentation methods. Then we applied our method on MRI brain image to determinate normal and pathological tissues.
In the area of image processing, segmentation of an image into multiple regions is very important for classification and recognition steps. It has been widely used in many application fields such as medical image analysis to characterize and detect anatomical structures, robotics features extraction for mobile robot localization and detection and map procession for lines and legends finding. Many techniques have been developed in the field of image segmentation. Methods based on intelligent techniques are the most used such as Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), and Particle Swarm Optimization (PSO) called metaheuristics algorithms. In this paper, we describe a novel method for segmentation of images based on one of the most popular and efficient metaheuristic algorithm called Particle Swarm optimization (PSO) for determining multilevel threshold for a given image. The proposed method takes advantage of the characteristics of the particle swarm optimization and improves the objective function value to updating the velocity and the position of particles. This method is compared to the basic PSO method, also, it is compared with other known multilevel segmentation methods to demonstrate its efficiency. Experimental results show that this method can reliably segment and give threshold values than other methods considering different measures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.