The classification of whole slide images (WSIs) provides physicians with an accurate analysis of diseases and also helps them to treat patients effectively. The classification can be linked to further detailed analysis and diagnosis. Deep learning (DL) has made significant advances in the medical industry, including the use of magnetic resonance imaging (MRI) scans, computerized tomography (CT) scans, and electrocardiograms (ECGs) to detect life-threatening diseases, including heart disease, cancer, and brain tumors. However, more advancement in the field of pathology is needed, but the main hurdle causing the slow progress is the shortage of large-labeled datasets of histopathology images to train the models. The Kimia Path24 dataset was particularly created for the classification and retrieval of histopathology images. It contains 23,916 histopathology patches with 24 tissue texture classes. A transfer learning-based framework is proposed and evaluated on two famous DL models, Inception-V3 and VGG-16. To improve the productivity of Inception-V3 and VGG-16, we used their pre-trained weights and concatenated these with an image vector, which is used as input for the training of the same architecture. Experiments show that the proposed innovation improves the accuracy of both famous models. The patch-to-scan accuracy of VGG-16 is improved from 0.65 to 0.77, and for the Inception-V3, it is improved from 0.74 to 0.79.
In this paper, we propose a technique to recognize multiple actions in a video using deep learning. The proposed approach is concerned with interpreting the overall context of a video and transforming it into one or more appropriate actions. In order to cope with multiple actions in a video, our proposed technique first determines the individual segments/shots in a video using intersections of color histograms. The segmented parts are then fed to the action recognition system comprising a combination of a Convolution Neural Network (CNN) and a Long-Short-Term Memory (LSTM) network trained on our action vocabulary. The segments are then labeled according to their predicted actions and a compact set of distinct actions is produced. Using the corpus generated by the shot detection phase, which includes the location of key frames in shots, and start/end timestamps of a shot, we can also perform video segmentation based on an action query. Hence, the proposed technique can be used for a number of tasks such as content censoring, on-demand scene retrieval, video summarizing, and query based scene/video retrieval, to name a few. The proposed technique also stands aprat from the existing approaches which either do not take into account the motion information for action prediction or do not perform action-based video segmentation. The experimental results presented in this paper show that the proposed technique not only finds the the complete set of actions present in the video, but can also find all the relevant parts in a video based on an action query.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.