BackgroundFlow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems.The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for training. In consequence, any learning with an abundance of normal cases and a limited set of specific anomalous cases is biased towards the types of anomalies represented in the training set. Such models do not accurately identify anomalies, whether previously known or unknown, that may exist in future samples tested. Although one-class classifiers trained using only normal cases would avoid such a bias, robust sample characterization is critical for a generalizable model. Owing to sample heterogeneity and instrumental variability, arbitrary characterization of samples usually introduces feature noise that may lead to poor predictive performance. Herein, we present a non-parametric Bayesian algorithm called ASPIRE (anomalous sample phenotype identification with random effects) that identifies phenotypic differences across a batch of samples in the presence of random effects. Our approach involves simultaneous clustering of cellular measurements in individual samples and matching of discovered clusters across all samples in order to recover global clusters using probabilistic sampling techniques in a systematic way.ResultsWe demonstrate the performance of the proposed method in identifying anomalous samples in two different FC data sets, one of which represents a set of samples including acute myeloid leukemia (AML) cases, and the other a generic 5-parameter peripheral-blood immunophenotyping. Results are evaluated in terms of the area under the receiver operating characteristics curve (AUC). ASPIRE achieved AUCs of 0.99 and 1.0 on the AML and generic blood immunophenotyping data sets, respectively.ConclusionsThese results demonstrate that anomalous samples can be identified by ASPIRE with almost perfect accuracy without a priori access to samples of anomalous subtypes in the training set. The ASPIRE approach is unique in its ability to form generalizations regarding normal and anomalous states given only very weak assumptions regarding sample characteristics and origin. Thus, ASPIRE could become highly instrumental in providing unique insights about observed biological phenomena in the absence of full information about the investigated samples.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-314) contains supplementary material, which is available to authorized users.
A recently introduced technique for pathogen recognition called BARDOT (BActeria Rapid Detection using Optical scattering Technology) belongs to the broad class of optical sensors and relies on forward-scatter phenotyping (FSP). The specificity of FSP derives from the morphological information that bacterial material encodes on a coherent optical wavefront passing through the colony. The system collects elastically scattered light patterns that, given a constant environment, are unique to each bacterial species and serovar. The notable similarity between FSP technology and spectroscopies is their reliance on statistical machine learning to perform recognition. Currently used methods utilize traditional supervised techniques which assume completeness of training libraries. However, this restrictive assumption is known to be false for most experimental conditions, resulting in unsatisfactory levels of accuracy, poor specificity, and consequently limited overall performance for biodetection and classification tasks. The presented work demonstrates application of the BARDOT system to classify bacteria belonging to the Salmonella class in a nonexhaustive framework, that is, without full knowledge about all the possible classes that can be encountered. Our study uses a Bayesian approach to learning with a nonexhaustive training dataset to allow for the automated detection of unknown bacterial classes.
Technologies for rapid detection of bacterial pathogens are crucial for securing the food supply. A light-scattering sensor recently developed for real-time identification of multiple colonies has shown great promise for distinguishing bacteria cultures. The classification approach currently used with this system relies on supervised learning. For accurate classification of bacterial pathogens, the training library should be exhaustive, i.e., should consist of samples of all possible pathogens. Yet, the sheer number of existing bacterial serovars and more importantly the effect of their high mutation rate would not allow for a practical and manageable training. In this study, we propose a Bayesian approach to learning with a nonexhaustive training dataset for automated detection of unmatched bacterial serovars, i.e., serovars for which no samples exist in the training library. The main contribution of our work is the Wishart conjugate priors defined over class distributions. This allows us to employ the prior information obtained from known classes to make inferences about unknown classes as well. By this means, we identify new classes of informational value and dynamically update the training dataset with these classes to make it increasingly more representative of the sample population. This results in a classifier with improved predictive performance for future samples. We evaluated our approach on a 28-class bacteria dataset and also on the benchmark 26-class letter recognition dataset for further validation. The proposed approach is compared against state-of-the-art involving density-based approaches and support vector domain description, as well as a recently introduced Bayesian approach based on simulated classes.
We present a new direction for semi-supervised learning where self-adjusting generative models replace fixed ones and unlabeled data can potentially improve learning even when labeled data is only partially-observed. We model each class data by a mixture model and use a hierarchical Dirichlet process (HDP) to model observed as well as unobserved classes. We extend the standard HDP model to accommodate unlabeled samples and introduce a new sharing strategy, within the context of Gaussian mixture models, that restricts sharing with covariance matrices while leaving the mean vectors free. Our research is mainly driven by real-world applications with evolving data-generating mechanisms where obtaining a fully-observed labeled data set is impractical. We demonstrate the feasibility of the proposed approach for semi-supervised learning in two such applications.
In this paper, we propose a segmentation algorithm which combines the ideas from local watershed transforms and the region based deformable models. Traditionally, watersheds are computed in the whole image and then some region merging techniques are applied on them to reach the segmentation of structures. Here, we propose that watershed regions can be used as operators in region-based deformable models. These regions are computed only when the deformable models reach them. Then, they are added to (or subtracted from) the deformable models via a measure computed from two terms: (i) statistical fit of regions to the models, region competition; (ii) smoothness of such fits, smoothness constraint. The proposed algorithm is computationally efficient because it operates on regions instead of pixels. In addition, this algorithm allows better boundary localization due to the edge information brought by watersheds. Moreover, the proposed algorithm can handle topological changes, e.g., split or merge, during the evolutions without an additional embedded surface as in the case of level set formulation. Furthermore, structure-based smoothness of segmented objects is obtained by using the smoothness term computed from the alignment of regions. We illustrate the efficiency and accuracy of the proposed technique on several medical data such as MRA and CTA data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.