DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics

Nowicka, Małgorzata; Robinson, Mark D.

doi:10.12688/f1000research.8900.2

Cited by 156 publications

(144 citation statements)

References 61 publications

Supporting

Mentioning

130

Contrasting

Order By: Relevance

“…For instance, Fordyce, Gompert, Forister, and Nice () rely on Dirichlet‐multinomial modelling (DMM) to analyze ecological count data, such as counts of behavioural and dietary choices of animals (also see Coblentz, Rosenblatt, & Novak, ). Similar models have been applied to large counts of DNA sequences—for instance, Fernandes et al (; aldex2 ), Nowicka and Robinson (; drim‐seq ), and Rosa et al (; hmp ) use DMM to estimate and compare feature‐specific relative abundances in transcriptomes and microbiomes. Additionally, DMM has been used to model mixtures of compositions, a situation that could arise in a laboratory‐derived microbial assemblage occurring as a contaminant within samples, or in mixtures of different communities in nature ( microbedmm , Holmes, Harris, & Quince, ; sourcetracker , Knights et al, ; biomico , Shafiei et al, ; feast , Shenhav et al, ; ecostructure , White, Dey, Mohan, Stephens, & Price, ).…”

Section: Introductionmentioning

confidence: 99%

“…For instance, Fordyce, Gompert, Forister, and Nice (2011) rely on Dirichlet-multinomial modelling (DMM) to analyze ecological count data, such as counts of behavioural and dietary choices of animals (also see Coblentz, Rosenblatt, & Novak, 2017). Similar models have been applied to large counts of DNA sequences-for instance, Fernandes et al (2014;aldex2), Nowicka and Robinson (2016;drim-seq), and Rosa et al (2012;hmp) Consequently, we conducted a simulation experiment to learn the limits and benefits of DMM through the analysis of data that encompass much of the variety in attributes encountered across scientific domains (e.g. replication, number of observations, and so on; Figure 2).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Harrison

Calder

Shastry

et al. 2020

Molecular Ecology Resources

View full text Add to dashboard Cite

Molecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modelled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet‐multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Harrison

Calder

Shastry

et al. 2020

Molecular Ecology Resources

View full text Add to dashboard Cite

show abstract

“…Analysis of RNA-seq data for most biologists is a bottleneck because of reliance on the skills of often over-stretched bioinformaticians who are needed to process large datasets and apply complex analytical programs to experimental data. Many RNA-seq differential analysis programs do not have the flexibility to handle complex experimental designs (such as time-course or developmental series data) and are error prone (Love et al, 2014;Hardcastle and Kelly, 2010;Anders et al, 2012;Nowicka and Robinson, 2016). Results can be inconsistent due to the use of multiple different combinations of tools or pipelines by different bioinformaticians.…”

Section: Introductionmentioning

confidence: 99%

A powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists

Guo

Τzioutziou

Stephen

et al. 2019

Preprint

View full text Add to dashboard Cite

RNA-sequencing (RNA-seq) analysis of gene expression and alternative splicing should be routine and robust but is often a bottleneck for biologists because of different and complex analysis programs and reliance on skilled bioinformaticians to perform the analysis. To overcome these issues, we have developed the "3D RNA-seq" App, an R shiny App which provides an easy-to-use, flexible and powerful tool for the three-way differential analysis: Differential Expression (DE), Differential Alternative Splicing (DAS) and Differential Transcript Usage (DTU) of RNA-seq data. The full analysis is extremely rapidand can be done within hours. The program integrates Limma, a state-of-the-art, highly rated differential expression analysis tool and adopts best practice for RNA-seq analysis. It runs the analysis through a user-friendly graphical interface, can handle complex experimental designs, allows user 2 setting of statistical parameters, visualizes the results through graphics and tables, and generates publication quality figures such as heat-maps, expression profiles and GO enrichment plots. The utility of 3D RNA-seq is illustrated by analysis of Arabidopsis and mouse RNA-seq data. The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to take control of the analysis of their RNA-seq data.

show abstract

“…DEXSeq, on the other hand, utilizes a Negative Binomial distribution to model the counts per exon. It was originally targeted for identifying differential exon usage, but has been also evaluated in the context of transcript usage (Soneson and others, 2016;Nowicka and Robinson, 2016;Love and others, 2018). SUPPA2 uses biological replicates to estimate differences in isoform proportions across conditions and between biological replicates.…”

Section: Introductionmentioning

confidence: 99%

ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel

McCabe

Nobel

Love

2019

Preprint

View full text Add to dashboard Cite

The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public datasets produced by genomic consortia as a reference, one can compare splicing patterns in a dataset of interest with those of a reference panel in which samples are divided into distinct groups (tissue of origin, disease status, etc). We propose ACTOR, a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a dataset to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression (GTEx) project as a reference dataset, we evaluate ACTOR on simulated and real RNA-seq datasets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.

show abstract

DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics

Cited by 156 publications

References 61 publications

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

A powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists

ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel

Contact Info

Product

Resources

About