2020
DOI: 10.1093/nargab/lqaa100
|View full text |Cite
|
Sign up to set email alerts
|

Shrinkage improves estimation of microbial associations under different normalization methods

Abstract: Estimation of statistical associations in microbial genomic survey count data is fundamental to microbiome research. Experimental limitations, including count compositionality, low sample sizes and technical variability, obstruct standard application of association measures and require data normalization prior to statistical estimation. Here, we investigate the interplay between data normalization, microbial association estimation and available sample size by leveraging the large-scale American Gut Project (AG… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
40
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 25 publications
(43 citation statements)
references
References 56 publications
(68 reference statements)
0
40
0
Order By: Relevance
“…Correspondingly, investigations of the functional capacities of the core and rare species biosphere are important to gain insights into the more stable part of community airway ecology in infancy. We thus investigated the functional capacity of core and rare background species based on the reference genomes for which uniform read distributions were obtained with raspir [30] . The tool gapseq utilised a pre-defined protein sequence reference pool for every sub-reaction involved in known core metabolic pathways of bacteria (n = 1779) [34] .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Correspondingly, investigations of the functional capacities of the core and rare species biosphere are important to gain insights into the more stable part of community airway ecology in infancy. We thus investigated the functional capacity of core and rare background species based on the reference genomes for which uniform read distributions were obtained with raspir [30] . The tool gapseq utilised a pre-defined protein sequence reference pool for every sub-reaction involved in known core metabolic pathways of bacteria (n = 1779) [34] .…”
Section: Resultsmentioning
confidence: 99%
“…Our recent software publication raspir in combination with gapseq , a tool introduced by Zimmermann et al (2021) facilitated the taxonomic and functional identification of core and rare species from shotgun metagenomic sequencing data and reference genomes, respectively, with reduced false discovery and omission rates [27] , [28] . Since previous reports have demonstrated that metagenome investigations are affected by the reference database of choice [29] and the normalisation strategy of count data for addressing the compositional behaviour of microbiome sequencing data [30] , [31] , [32] , [33] , we tested our model simulations, random forest bootstrapping aggregations, ecological network analysis and kernel-based machine learning applications on infant metagenome datasets, generated from read alignments towards either a pan-genome or a one-strain-per-species reference database. Moreover, we generated datasets based on three different read count normalisation strategies, namely variance-stabilising transformations (VST), relative log expression (RLE) and bacterial to human cell ratios (BCPHC) and worked with three distinct rarity thresholds (15th, 25th and 35th species abundance percentile) to define the core and rare species biosphere.…”
Section: Introductionmentioning
confidence: 99%
“…Normalization techniques are required to make read counts comparable across different samples [ 47 , 48 ] (step 1c in Figure 1 ). The normalization approaches included in are summarized in Table 2 .…”
Section: Network Construction and Characterizationmentioning
confidence: 99%
“…The normalization approaches included in are summarized in Table 2 . A description of these methods is available in [ 47 , 48 ]. Note that forcing the read counts of each sample to a unique sum (as done with total sum scaling) does not change the compositional structure.…”
Section: Network Construction and Characterizationmentioning
confidence: 99%
See 1 more Smart Citation