2018
DOI: 10.1186/s12864-018-5160-5
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and correction of compositional bias in sparse sequencing count data

Abstract: BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
57
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 81 publications
(58 citation statements)
references
References 60 publications
1
57
0
Order By: Relevance
“…A variety of pipelines is available to then get a relative abundance table of operational taxonomic units (OTUs; Sun et al, 2011). While the choice of the pipeline can certainly affect the results, the training set (Werner et al, 2012) and the method of normalization (Kumar et al, 2018) also have a major impact even though it is seldom discussed. The most common normalization procedure consists in dividing by the total number of reads in order to obtain the proportion of each OTU.…”
Section: Introductionmentioning
confidence: 99%
“…A variety of pipelines is available to then get a relative abundance table of operational taxonomic units (OTUs; Sun et al, 2011). While the choice of the pipeline can certainly affect the results, the training set (Werner et al, 2012) and the method of normalization (Kumar et al, 2018) also have a major impact even though it is seldom discussed. The most common normalization procedure consists in dividing by the total number of reads in order to obtain the proportion of each OTU.…”
Section: Introductionmentioning
confidence: 99%
“…Secondly, some normalizations, such as the geometric mean method implemented in DESeq2 or the trimmed mean of M-values of edgeR, have size factors mathematically equivalent or very similar to the compositional log-ratio proposed by Aitchison 24,36 . This has been shown to reduce the impact of compositionality on DA results 37 . We did not test the ANCOM package 38 because it was too slow for assessment in the simulation studies.…”
Section: As Shown Inmentioning
confidence: 99%
“…Important features include the level of excess zeros in the samples as well as the sequencing platform used ( Figure 3B-D). The strong influence of the number of zeros in the samples may be alleviated by improved scale estimations prior to data transformations [29] or zero-aware rank-based transformations [54]. However, we also recognized that many inferred latent components for the AGP data did not correlate with any of the measured covariates.…”
Section: Discussionmentioning
confidence: 98%