Motivation: The clustering of biomedical images according to their phenotype is an important step in early drug discovery. Modern highcontent-screening devices easily produce thousands of cell images, but the resulting data is usually unlabelled and it requires extra effort to construct a visual representation that supports the grouping according to the presented morphological characteristics.
Results:We introduce a novel approach to visual representation learning that is guided by metadata. In high-content-screening, metadata can typically be derived from the experimental layout, which links each cell image of a particular assay to the tested chemical compound and corresponding compound concentration. In general, there exists a one-to-many relationship between phenotype and compound, since various molecules and different dosage can lead to one and the same alterations in biological cells. Our empirical results show that metadata-guided visual representation learning is an effective approach for clustering biomedical images. We have evaluated our proposed approach on both benchmark and realworld biological data. Furthermore, we have juxtaposed implicit and explicit learning techniques, where both loss function and batch construction differ. Our experiments demonstrate that metadata-guided visual representation learning is able to identify commonalities and distinguish differences in visual appearance that lead to meaningful clusters, even without image-level annotations.Note: Please refer to the supplementary material for implementation details on metadata-guided visual representation learning strategies.
Recurrence quantification analysis (RQA) is a well-known tool for studying nonlinear behavior of dynamical systems, e.g. for finding transitions in climate data or classifying reading abilities. But the construction of a recurrence plot and the subsequent quantification of its small and large scale structures is computational demanding, especially for long time series or data streams with high sample rate. One way to reduce the time and space complexity of RQA are approximations, which are sufficient for many data analysis tasks, although they do not guarantee exact solutions. In earlier work, we proposed how to approximate diagonal line based RQA measures and showed how these approximations perform in finding transitions for difference equations. The present work aims at extending these approximations to vertical line based RQA measures and investigating the runtime/accuracy of our approximate RQA measures on real-life climate data. Our empirical evaluation shows that the proposed approximate RQA measures achieve tremendous speedups without losing much of the accuracy.
In time series mining, the Dynamic Time Warping (DTW) distance is a commonly and widely used similarity measure. Since the computational complexity of the DTW distance is quadratic, various kinds of warping constraints, lower bounds and abstractions have been developed to speed up time series mining under DTW distance. In this contribution, we propose a novel Lucky Time Warping (LTW) distance, with linear time and space complexity, which uses a greedy algorithm to accelerate distance calculations for nearest neighbor classification. The results show that, compared to the Euclidean distance (ED) and (un)constrained DTW distance, our LTW distance trades classification accuracy against computational cost reasonably well, and therefore can be used as a fast alternative for nearest neighbor time series classification.
Several deep learning approaches have been proposed to address the challenges in computational pathology by learning structural details in an unbiased way. Transfer learning allows starting from a learned representation of a pretrained model to be directly used or fine-tuned for a new domain. However, in histopathology, the problem domain is tissue-specific and putting together a labelled data set is challenging. On the other hand, whole slide-level annotations, such as biomarker levels, are much easier to obtain. We compare two pretrained models, one histology-specific and one from ImageNet on various computational pathology tasks. We show that a domain-specific model (HistoNet) contains richer information for biomarker classification, localization of biomarker-relevant morphology within a slide, and the prediction of expert-graded features. We use a weakly supervised approach to discriminate slides based on biomarker level and simultaneously predict which regions contribute to that prediction. We employ multitask learning to show that learned representations correlate with morphological features graded by expert pathologists. All of these results are demonstrated in the context of renal toxicity in a mechanistic study of compound toxicity in rat models. Our results emphasize the importance of histology-specific models and their knowledge representations for solving a wide range of computational pathology tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.