2024
DOI: 10.1101/2024.04.01.587602
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Beyond Normalization: Incorporating Scale Uncertainty in Microbiome and Gene Expression Analysis

Michelle Pistner Nixon,
Gregory B. Gloor,
Justin D. Silverman

Abstract: Though statistical normalizations are often used in differential abundance or differential expression analysis to address sample-to-sample variation in sequencing depth, we offer a better alternative. These normalizations often make strong, implicit assumptions about the scale of biological systems (e.g., microbial load). Thus, analyses are susceptible to even slight errors in these assumptions, leading to elevated rates of false positives and false negatives. We introduce scale models as a generalization of n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 43 publications
(86 reference statements)
0
0
0
Order By: Relevance
“…Occasionally, researchers will analyze ratios between glycans across conditions, perhaps out of the correct-but tacit-intuition that this mitigates the compositional nature of the data. We maintain here that this compositional nature is a major problem in the field and will become worse as sample sizes increase, because of unacknowledged bias 10,14 , leading to incredible false-positive rates of >30% even at rather modest sample sizes (Fig. 1b).…”
Section: Analyzing Comparative Glycomics Data As Non-compositional Da...mentioning
confidence: 89%
See 1 more Smart Citation
“…Occasionally, researchers will analyze ratios between glycans across conditions, perhaps out of the correct-but tacit-intuition that this mitigates the compositional nature of the data. We maintain here that this compositional nature is a major problem in the field and will become worse as sample sizes increase, because of unacknowledged bias 10,14 , leading to incredible false-positive rates of >30% even at rather modest sample sizes (Fig. 1b).…”
Section: Analyzing Comparative Glycomics Data As Non-compositional Da...mentioning
confidence: 89%
“…In the context of glycomics data, changes in relative abundances are often cited as potential disease biomarkers 7,8 . Yet not accounting for the compositional nature of data is a major contributor to divergent results of differential expression methods 9 and, in general, results in very high false-positive rates 10,11 . Traditional comparative glycomics analyses typically ignore this compositional quality, leading to spurious interpretations and false-positive results, such as perceived decreases or increases in glycan quantities that are artifacts of relative abundance changes rather than absolute ones.…”
Section: Introductionmentioning
confidence: 99%
“…The partial dependence profiles in Fig C in S2 Text suggests that mirror statistics could be a promising approach for FDR-guaranteed inference of microbe interaction networks, and future systematic experiments should explore their validity and power. Our simulation study revealed that forecasting performance depends on normalization strategy, and identifying the optimal normalization for transfer function modeling or understanding whether it is possible to bypass it entirely is an open problem [ 39 , 40 ]. The construction of mirror statistics via partial dependence profiles depends only on having access to a simulator f that can generate hypothetical responses, and it could be used to contrast alternative initial states y t or host features z .…”
Section: Discussionmentioning
confidence: 99%
“…The partial dependence profiles in Fig C in S2 Text suggests that mirror statistics could be a promising approach for FDR-guaranteed inference of microbe interaction networks, and future systematic experiments should explore their validity and power. Our simulation study revealed that forecasting performance depends on normalization strategy, and identifying the optimal normalization for transfer function modeling or understanding whether it is possible to bypass it entirely is an open problem [39,40]. The construction of mirror statistics via partial dependence profiles depends only on having access to a simulator f that can generate hypothetical responses, and it could be used to contrast alternative initial states y t or host features z.…”
Section: Discussionmentioning
confidence: 99%