Abstract:Single-cell data provides means to dissect the composition of complex tissues and specialized cellular environments. However, the analysis of such measurements is complicated by high levels of technical noise and intrinsic biological variability. We describe a probabilistic model of expression magnitude distortions typical of single-cell RNA sequencing measurements, which enables detection of differential expression signatures and identification of subpopulations of cells in a way that is more tolerant of nois… Show more
“…A number of approaches are being considered for assuring confidence in these measurements, including the use of the External RNA Controls Consortium (ERCC) spike‐in material34, 35 to provide a metric of accuracy for those known sequences. Other approaches include the use of Bayesian statistics to assess real differences in the presence of dropout events36 that occur due to the low amount of RNA in cell samples and result in detection of the gene in some cells and not in others. When the apparent heterogeneity in gene expression is simply due to technical issues, the data can lead to erroneous conclusions of biological heterogeneity.…”
Section: Unique Challenges and Opportunities Posed By Single‐cell Anamentioning
The high‐content interrogation of single cells with platforms optimized for the multiparameter characterization of cells in liquid and solid biopsy samples can enable characterization of heterogeneous populations of cells ex vivo. Doing so will advance the diagnosis, prognosis, and treatment of cancer and other diseases. However, it is important to understand the unique issues in resolving heterogeneity and variability at the single cell level before navigating the validation and regulatory requirements in order for these technologies to impact patient care. Since 2013, leading experts representing industry, academia, and government have been brought together as part of the Foundation for the National Institutes of Health (FNIH) Biomarkers Consortium to foster the potential of high‐content data integration for clinical translation.
“…A number of approaches are being considered for assuring confidence in these measurements, including the use of the External RNA Controls Consortium (ERCC) spike‐in material34, 35 to provide a metric of accuracy for those known sequences. Other approaches include the use of Bayesian statistics to assess real differences in the presence of dropout events36 that occur due to the low amount of RNA in cell samples and result in detection of the gene in some cells and not in others. When the apparent heterogeneity in gene expression is simply due to technical issues, the data can lead to erroneous conclusions of biological heterogeneity.…”
Section: Unique Challenges and Opportunities Posed By Single‐cell Anamentioning
The high‐content interrogation of single cells with platforms optimized for the multiparameter characterization of cells in liquid and solid biopsy samples can enable characterization of heterogeneous populations of cells ex vivo. Doing so will advance the diagnosis, prognosis, and treatment of cancer and other diseases. However, it is important to understand the unique issues in resolving heterogeneity and variability at the single cell level before navigating the validation and regulatory requirements in order for these technologies to impact patient care. Since 2013, leading experts representing industry, academia, and government have been brought together as part of the Foundation for the National Institutes of Health (FNIH) Biomarkers Consortium to foster the potential of high‐content data integration for clinical translation.
“…As a result, mRNA molecules in a cell can be randomly missed during the reverse transcription step and the following cDNA amplification step, and the mRNA products of some genes may be totally missed in the capturing procedure, which then produces dropout zeros in the scRNA-seq data (3,25,26). In this section, we try to model this mRNA capture procedure and study what impact this process will have on the ZINB distribution.…”
Section: Model the Mrna Capture Proceduresmentioning
confidence: 99%
“…This phenomenon is called dropout events (25,26). We call this type of zero values as dropout zeros.…”
Section: Introductionmentioning
confidence: 99%
“…The methods we compared with include traditional DE detection methods edgeR (16) and DEGseq (17) that were developed for bulk RNA-seq data but have been also used on many scRNA-seq data (12), as well as new methods specifically developed for scRNA-seq data including BPSC (34), D3E (19), monocle (35), SCDE (25). We applied DEsingle on a real scRNA-seq dataset of human preimplantation embryonic cells of different days of embryo development (36) and found interesting and 10% represent the observed data obtained from the original data after mRNA capture procedure.…”
There are excessive zero values in single-cell RNA-seq (scRNA-seq) data. Some of them are real zeros of non-expressed genes, while the others are the so-called "dropout" zeros caused by the low mRNA capture efficiency of tiny amounts of mRNAs in single cells. These two types of zeros should be distinguished in differential expression (DE) analysis and other types of analyses of scRNA-seq data. We proposed a new method DEsingle for DE analysis in scRNA-seq data by employing the Zero-Inflated Negative Binomial (ZINB) model. We proved that DEsingle could estimate the percentage of real zeros and dropout zeros by modelling the mRNA capture procedure. According to this model, DEsingle can distinguish three types of differential expression between two groups of single cells, with regard to differences in expression status, in expression abundances, and in both.We validated the performance of the method on simulation data and applied it on real scRNA-seq data of human preimplantation embryonic cells of different days of embryo development. Results showed that DEsingle outperforms existing methods for scRNA-seq DE analysis, and can reveal different types of DE genes that are enriched in different functions.
The recent developments in high‐throughput single‐cell RNA sequencing technology (scRNA‐seq) have enabled the generation of vast amounts of transcriptomic data at cellular resolution. With these advances come new modes of data analysis, building on high‐dimensional data mining techniques. Here, we consider biological questions for which scRNA‐seq data is used, both at a cell and gene level, and describe tools available for these types of analyses. This is an exciting and rapidly evolving field, where clustering, pseudotime inference, branching inference and gene‐level analyses are particularly informative areas of computational analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.