The generation of the textual description of the differences in images is a relatively new concept that requires the fusion of both computer vision and natural language techniques. In this paper, we present a novel Fully Convolutional CaptionNet (FCC) that employs an encoder-decoder framework to perform visual feature extractions, compute the feature distances, and generate new sentences describing the measured distances. After extracting the features of the images, a contrastive function is used to compute their weighted L1 distance which is learned and selectively attended to determine salient sections of the feature at every time step. The attended feature region is adequately matched to corresponding words iteratively until a sentence is completed. We propose the application of upsampling network to enlarge the features' field of view, this provides a robust pixel-based discrepancy computation. Our extensive experiments indicate that the FCC model outperforms other learning models on the benchmark Spot-the-Diff datasets by generating succinct and meaningful textual differences in images.
INDEX TERMSImage captioning, deep learning, Siamese network, recurrent neural network, convolutional neural network, attention, fully convolutional networks.
Efforts to enhance accuracy in medical diagnostics in molecular medicine have contributed to the wide use of artificial neural network (ANN) algorithms for disease detection due to its ability to process large medical datasets and integrate them into characterized outputs to avoid misdiagnosis. Typically, the application of ANNs have proven useful in sample analyses of patients with diabetes and in decision support systems. Over the years, various ANN models have been utilized in medical diagnostics; however, these approaches still maintain certain levels of error and have lesser training and testing accuracies in disease detection. In this study, we propose a Feedforward Artificial Neural Network (FFANN) model with a dense neural network architecture suitable for processing numeric and textual dataset. We carefully designed our model structure to have the ability to maximize the number of layers and nodes required to learn every feature of the dataset and also to perform effective computations but avoiding model under fitting and overfitting which occurs when less or more layers are used respectively. This approach puts our model ahead of other state-of-the-art prediction models which have been proposed in terms of performance as it achieved 97.27% and 96.09% training and testing accuracies, respectively, for type 2 diabetes detection on Pima Indian Diabetes dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.