Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0844-5) contains supplementary material, which is available to authorized users.
Although it is increasingly evident that cancer is influenced by signals emanating from tumor stroma, little is known regarding how changes in stromal gene expression affect epithelial tumor progression. We used laser capture microdissection to compare gene expression profiles of tumor stroma from 53 primary breast tumors and derived signatures strongly associated with clinical outcome. We present a new stroma-derived prognostic predictor (SDPP) that stratifies disease outcome independently of standard clinical prognostic factors and published expression-based predictors. The SDPP predicts outcome in several published whole tumor-derived expression data sets, identifies poor-outcome individuals from multiple clinical subtypes, including lymph node-negative tumors, and shows increased accuracy with respect to previously published predictors, especially for HER2-positive tumors. Prognostic power increases substantially when the predictor is combined with existing outcome predictors. Genes represented in the SDPP reveal the strong prognostic capacity of differential immune responses as well as angiogenic and hypoxic responses, highlighting the importance of stromal biology in tumor progression.
Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks – mammalian cell population identification to determine if automated algorithms can reproduce expert manual gating, and sample classification to determine if analysis pipelines can identify characteristics that correlate with external variables (e.g., clinical outcome). This analysis presents the results of the first of these challenges. Several methods performed well compared to manual gating or external variables using statistical performance measures, suggesting that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However, few analytic tools have been developed specifically for the statistical and analytical challenges of single-cell quantitative polymerase chain reactions data.Results: We present a statistical framework for the exploration, quality control and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. Although developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events.Availability: All results presented here were obtained using the SingleCellAssay R package available on GitHub (http://github.com/RGLab/SingleCellAssay).Contact: rgottard@fhcrc.orgSupplementary information: Supplementary data are available at Bioinformatics online.
Advances in flow cytometry and other single-cell technologies have enabled high-dimensional, high-throughput measurements of individual cells and allowed interrogation of cell population heterogeneity. Computational tools to take full advantage of these technologies are lacking. Here, we present COMPASS, a computational framework for unbiased polyfunctionality analysis of antigen-specific T-cell subsets. COMPASS uses a Bayesian hierarchical framework to model all observed functional cell subsets and select those most likely to exhibit antigen-specific responses. Cell-subset responses are quantified by posterior probabilities, while subject-level responses are quantified by two novel summary statistics that can be correlated directly with clinical outcome, and describe the quality of an individual’s (poly)functional response. Using three clinical datasets of cytokine production we demonstrate how COMPASS improves characterization of antigen-specific T cells and reveals novel cellular correlates of protection in the RV144 HIV vaccine efficacy trial that are missed by other methods. COMPASS is available as open-source software.
Standardization of immunophenotyping requires careful attention to reagents, sample handling, instrument setup, and data analysis, and is essential for successful cross-study and cross-center comparison of data. Experts developed five standardized, eight-color panels for identification of major immune cell subsets in peripheral blood. These were produced as pre-configured, lyophilized, reagents in 96-well plates. We present the results of a coordinated analysis of samples across nine laboratories using these panels with standardized operating procedures (SOPs). Manual gating was performed by each site and by a central site. Automated gating algorithms were developed and tested by the FlowCAP consortium. Centralized manual gating can reduce cross-center variability, and we sought to determine whether automated methods could streamline and standardize the analysis. Within-site variability was low in all experiments, but cross-site variability was lower when central analysis was performed in comparison with site-specific analysis. It was also lower for clearly defined cell subsets than those based on dim markers and for rare populations. Automated gating was able to match the performance of central manual analysis for all tested panels, exhibiting little to no bias and comparable variability. Standardized staining, data collection, and automated gating can increase power, reduce variability, and streamline analysis for immunophenotyping.
Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.