SUMMARY Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as “doublets,” which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool—Doublet-Finder—that identifies doublets using only gene expression data. DoubletFinder predicts doublets according to each real cell’s proximity in gene expression space to artificial doublets created by averaging the transcriptional profile of randomly chosen cell pairs. We first use scRNA-seq datasets where the identity of doublets is known to show that DoubletFinder identifies doublets formed from transcriptionally distinct cells. When these doublets are removed, the identification of differentially expressed genes is enhanced. Second, we provide a method for estimating DoubletFinder input parameters, allowing its application across scRNA-seq datasets with diverse distributions of cell types. Lastly, we present “best practices” for DoubletFinder applications and illustrate that DoubletFinder is insensitive to an experimentally validated kidney cell type with “hybrid” expression features.
Sample multiplexing facilitates scRNA-seq by reducing costs and artifacts such as cell doublets. However, universal and scalable sample barcoding strategies have not been described. We therefore developed MULTI-seq: multiplexing using lipid-tagged indices for single-cell and single-nucleus RNA sequencing. MULTI-seq reagents can barcode any cell type or nucleus from any species with an accessible plasma membrane. The method involves minimal sample *
Single-cell RNA sequencing (scRNA-seq) using droplet microfluidics occasionally 11 produces transcriptome data representing more than one cell. These technical artifacts are 12 caused by cell doublets formed during cell capture and occur at a frequency proportional to the 13 total number of sequenced cells. The presence of doublets can lead to spurious biological 14 conclusions, which justifies the practice of sequencing fewer cells to limit doublet formation rates. 15Here, we present a computational doublet detection tool -DoubletFinder -that identifies 16 doublets based solely on gene expression features. DoubletFinder infers the putative gene 17 expression profile of real doublets by generating artificial doublets from existing scRNA-seq data. 18Neighborhood detection in gene expression space then identifies sequenced cells with 19 increased probability of being doublets based on their proximity to artificial doublets. 20 DoubletFinder robustly identifies doublets across scRNA-seq datasets with variable numbers of 21 cells and sequencing depth, and predicts false-negative and false-positive doublets defined 22 using conventional barcoding approaches. We anticipate that DoubletFinder will aid in scRNA-23 seq data analysis and will increase the throughput and accuracy of scRNA-seq experiments. 24 25 INTRODUCTION 26Since its introduction nearly a decade ago, scRNA-seq has been used to elucidate 27 previously unknown cell types and reconstruct developmental dynamics among heterogeneous 28 cell populations (Human Cell Atlas Consortium, 2017). At first, scRNA-seq workflows were 29All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/352484 doi: bioRxiv preprint first posted online Jun. 20, 2018; 2 limited to tens to hundreds of cells which hindered data interpretation due to batch effects and 30 low statistical power (Stegle et al., 2016). Today, sequencing thousands to hundreds of 31 thousands of cells is routine due to the advent of droplet microfluidics and nanowell-based 32 sequencing strategies (Macosko et al., 2015; Klein et al., 2015; Zheng et al., 2017; Gierahn et 33 al., 2017; Takara Bio USA, 2018). These techniques rely on a Poisson loading strategy to 34 compartmentalize individual cells and mRNA capture beads before cell lysis, mRNA capture, 35 and transcript barcoding via reverse transcription. Since cells are captured randomly, the 36 proportion of droplets containing >1 cell -known as doublets -scales linearly across an 37 experimentally-relevant range of input cell concentrations (10X Genomics, 2017), justifying the 38 practice of limiting the number of sequenced cells to minimize doublet formation rates. 39The confounding effects of doublets in scRNA-seq data are well-appreciated (Ilicic et al., 40 2016). However, genomic and cellular barcoding techniques for identifying doublets have only 41 recently ...
Steering the differentiation of induced pluripotent stem cells (iPSCs) toward specific cell types is crucial for patient-specific disease modeling and drug testing. This effort requires the capacity to predict and control when and how multipotent progenitor cells commit to the desired cell fate. Cell fate commitment represents a critical state transition or "tipping point" at which complex systems undergo a sudden qualitative shift. To characterize such transitions during iPSC to cardiomyocyte differentiation, we analyzed the gene expression patterns of 96 developmental genes at single-cell resolution. We identified a bifurcation event early in the trajectory when a primitive streak-like cell population segregated into the mesodermal and endodermal lineages. Before this branching point, we could detect the signature of an imminent critical transition: increase in cell heterogeneity and coordination of gene expression. Correlation analysis of gene expression profiles at the tipping point indicates transcription factors that drive the state transition toward each alternative cell fate and their relationships with specific phenotypic readouts. The latter helps us to facilitate small molecule screening for differentiation efficiency. To this end, we set up an analysis of cell population structure at the tipping point after systematic variation of the protocol to bias the differentiation toward mesodermal or endodermal cell lineage. We were able to predict the proportion of cardiomyocytes many days before cells manifest the differentiated phenotype. The analysis of cell populations undergoing a critical state transition thus affords a tool to forecast cell fate outcomes and can be used to optimize differentiation protocols to obtain desired cell populations.single-cell analysis | critical state transitions | iPSC to cardiomyocyte differentiation | differentiation efficiency | prediction
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.