Naught all zeros in sequence count data are the same

Silverman, Justin D.; Roche, Kimberly; Mukherjee, Sayan; David, Lawrence A.

doi:10.1101/477794

Cited by 56 publications

(83 citation statements)

References 82 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Excessive zeros in microbiome studies are common and can potentially skew data (41,73). Zeros come in multiple forms (74). Outlier zeros are due to extraneous conditions, structural zeros are due to the nature of different experimental groups, and sampling zeros are any other zero that may be due to low sampling depth.…”

Section: Discussionmentioning

confidence: 99%

The rodent vaginal microbiome across the estrous cycle and the effect of genital nerve electrical stimulation

Levy

Bassis

Kennedy

et al. 2019

Preprint

View full text Add to dashboard Cite

Treatment options are limited for the approximately 40% of postmenopausal women worldwide who suffer from female sexual dysfunction (FSD). Neural stimulation has shown potential as a treatment for genital arousal FSD, however the mechanisms for its improvement are unknown. One potential cause of some cases of genital arousal FSD are changes to the composition of the vaginal microbiome, which is associated with vulvovaginal atrophy. The primary hypothesis of this study was that neural stimulation may induce healthy changes in the vaginal microbiome, thereby improving genital arousal FSD symptoms. This study also sought to examine the composition of the rat vaginal microbiome, which is understudied. Nulliparous female rats were used. Treatment animals (n=5) received 30 minutes of cutaneous electrical stimulation targeting the genital branch of the pudendal nerve, and Control animals (n=4) had 30-minute sessions without stimulation. Vaginal lavage samples were taken during a 14-day baseline period including multiple estrous periods and after twice-weekly 30-minute sessions across a six-week trial period. Samples were sequenced at the University of Michigan Host Microbiome Initiative and analyzed for baseline bacterial trends and for changes due to stimulation. We found that the rat vaginal microbiome is dominated by Proteobacteria, Firmicutes, and Actinobacteria phyla bacteria, which changed in abundance during the estrous cycle and in relationship to each other. While the overall stimulation effects were unclear, some Treatment animals had lower variance in the microbiome diversity for sequential samples than Control animals, suggesting that stimulation may help normalize the vaginal microbiome. Future studies may consider additional physiological parameters, in addition to the microbiome composition, to further examine vaginal health and the effects of stimulation.

show abstract

Section: Discussionmentioning

confidence: 99%

The rodent vaginal microbiome across the estrous cycle and the effect of genital nerve electrical stimulation

Levy

Bassis

Kennedy

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Relative abundance is given by π ij = x ij /t i where total transcripts t i = j x ij . Since n i t i , there is a "competition to be counted" [33]; genes with large relative abundance π ij in the original cell are more likely to have nonzero UMI counts, but genes with small relative abundances may be observed with UMI counts of exact zeros. The UMI counts y ij are a multinomial sample of the true biological counts x ij , containing only relative information about expression patterns in the cell [34,33].…”

Section: Multinomial Sampling Distribution For Umi Countsmentioning

confidence: 99%

Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model

Townes

Hicks

Aryee

et al. 2019

Preprint

138

287

View full text Add to dashboard Cite

Single cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.

show abstract

“…For example, whereas under (39) the mean of is (1 − ) , under (40) it is . More generally, these two different interpretations could lead to different inferences about e.g., differential expression or clustering 66 . We argue that both theory and empirical evidence support the use of the Poisson measurement model, and not the ZIP measurement model, and that therefore analyses using the ZINB observation model should be derived and interpreted using (39) rather than (40).…”

Section: Appendix D Identifiability Of Measurement and Expression Momentioning

confidence: 99%

Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis

Sarkar

Stephens

2020

Preprint

View full text Add to dashboard Cite

How to model and analyze scRNA-seq data has been the subject of considerable confusion and debate. The high proportion of zero counts in a typical scRNA-seq data matrix has garnered particular attention, and lead to widespread but inconsistent use of terminology such as "dropout" and "missing data." Here, we argue that much of this terminology is unhelpful and confusing, and outline simple ways of thinking about models for scRNA-seq data that can help avoid this confusion. The key ideas are: (1) observed scRNA-seq counts reflect both the actual expression level of each gene in each cell and the measurement process, and it is important for models to explicitly distinguish contributions from these two distinct factors; and (2) the measurement process can be adequately described by a simple Poisson model, a claim for which we provide both theoretical and empirical support. We show how these ideas lead to a simple, flexible statistical framework that encompasses a number of commonly used models and analysis methods, and how this framework makes explicit their different assumptions and helps interpret their results. We also illustrate how explicitly separating models for expression and measurement can help address questions of biological interest, such as whether mRNA expression levels are multi-modal among cells.

show abstract

Naught all zeros in sequence count data are the same

Cited by 56 publications

References 82 publications

The rodent vaginal microbiome across the estrous cycle and the effect of genital nerve electrical stimulation

The rodent vaginal microbiome across the estrous cycle and the effect of genital nerve electrical stimulation

Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model

Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis

Contact Info

Product

Resources

About