PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data

Song, Dongyuan; Li, Jingyi Jessica

doi:10.1101/2020.11.17.387779

Cited by 6 publications

(5 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to selecting experimental protocols before conducting scRNA-seq experiments, a common challenge after collecting scRNA-seq data is to choose among the many available data analysis methods in an unbiased manner. For example, many algorithms have been developed for missing gene expression imputation [36,37], dimensionality reduction [38][39][40], cell clustering [41][42][43][44], rare cell type detection [45][46][47], differentially expressed gene identification [48][49][50][51][52], and trajectory inference [53][54][55][56][57]. Even though several benchmark and comparative studies have been carried out for common analysis tasks [58][59][60][61][62][63], most of them have only evaluated a subset of available computational methods using data from limited experimental protocols.…”

Section: Introductionmentioning

confidence: 99%

scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured

Sun

Song

2021

Genome Biol

Self Cite

View full text Add to dashboard Cite

A pressing challenge in single-cell transcriptomics is to benchmark experimental protocols and computational methods. A solution is to use computational simulators, but existing simulators cannot simultaneously achieve three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill this gap, we propose scDesign2, a transparent simulator that achieves all three goals and generates high-fidelity synthetic data for multiple single-cell gene expression count-based technologies. In particular, scDesign2 is advantageous in its transparent use of probabilistic models and its ability to capture gene correlations via copulas.

show abstract

Section: Introductionmentioning

confidence: 99%

scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured

Sun

Song

2021

Genome Biol

Self Cite

View full text Add to dashboard Cite

show abstract

“…This issue is evidenced by serious concerns about the widespread miscalculation and misuse of p-values in the scientific community [30]. As a result, bioinformatics tools using questionable p-values either cannot reliably control the FDR to a target level [23] or lack power to make discoveries [31]; see Results. Therefore, p-value-free control of FDR is desirable, as it would make data analysis more transparent and thus improve the reproducibility of scientific research.…”

Section: Resultsmentioning

confidence: 99%

Clipper: p-value-free FDR control on high-throughput data from two conditions

Chen

Song³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

High-throughput biological data analysis commonly involves the identification of “interesting” features (e.g., genes, genomic regions, and proteins), whose values differ between two conditions, from numerous features measured simultaneously. To ensure the reliability of such analysis, the most widely-used criterion is the false discovery rate (FDR), the expected proportion of uninteresting features among the identified ones. Existing bioinformatics tools primarily control the FDR based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions, two requirements that are often unmet in biological studies. To address this issue, we propose Clipper, a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper is applicable to identifying both enriched and differential features from high-throughput biological data of diverse types. In comprehensive simulation and real-data benchmarking, Clipper outperforms existing generic FDR control methods and specific bioinformatics tools designed for various tasks, including peak calling from ChIP-seq data, differentially expressed gene identification from RNA-seq data, differentially interacting chromatin region identification from Hi-C data, and peptide identification from mass spectrometry data. Notably, our benchmarking results for peptide identification are based on the first mass spectrometry data standard that has a realistic dynamic range. Our results demonstrate Clipper’s flexibility and reliability for FDR control, as well as its broad applications in high-throughput data analysis.

show abstract

“…However, there is a growing need to generalize our framework to identify features across more than two conditions. For example, temporal analysis of scRNA-seq data aims to identify genes whose expression levels change along cell pseudotime [31]. To tailor Clipper for such analysis, we could define a new contrast score that differentiates the genes with stationary expression (uninteresting features) from the other genes with varying expression (interesting features).…”

Section: Discussionmentioning

confidence: 99%

“…This issue is evidenced by serious concerns about the widespread miscalculation and misuse of p-values in the scientific community [30]. As a result, bioinformatics tools using questionable p-values either cannot reliably control the FDR to a target level [23] or lack power to make discoveries [31]; see the "Results" section. Therefore, p-value-free control of FDR is desirable, as it would make data analysis more transparent and thus improve the reproducibility of scientific research.…”

Section: Introductionmentioning

confidence: 99%

Clipper: p-value-free FDR control on high-throughput data from two conditions

Chen

Song

et al. 2021

Genome Biol

Self Cite

View full text Add to dashboard Cite

High-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.

show abstract

PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data

Cited by 6 publications

References 69 publications

scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured

scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured

Clipper: p-value-free FDR control on high-throughput data from two conditions

Clipper: p-value-free FDR control on high-throughput data from two conditions

Contact Info

Product

Resources

About