2020
DOI: 10.1101/2020.09.09.290049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

To rarefy or not to rarefy: Enhancing diversity analysis of microbial communities through next-generation sequencing and rarefying repeatedly

Abstract: The application of amplicon sequencing in water research provides a rapid and sensitive technique for microbial community analysis in a variety of environments ranging from freshwater lakes to water and wastewater treatment plants. It has revolutionized our ability to study DNA collected from environmental samples by eliminating the challenges associated with lab cultivation and taxonomic identification. DNA sequencing data consist of discrete counts of sequence reads, the total number of which is the library … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(26 citation statements)
references
References 86 publications
(192 reference statements)
0
25
0
Order By: Relevance
“…To rarefy microbiome data is an ongoing scientific discussion (148)(149)(150)(151) and here we use both alternative normalization techniques and rarefaction. For the majority of the analyses, after investigating library sizes, rarefaction curves and to maintain sufficient replication across all sites (Figure S17), we utilized raw read counts, proportions, centered log-ratio, or Hellinger transformations on the data as appropriate when performing statistics and generating visualizations.…”
Section: Downloaded Frommentioning
confidence: 99%
“…To rarefy microbiome data is an ongoing scientific discussion (148)(149)(150)(151) and here we use both alternative normalization techniques and rarefaction. For the majority of the analyses, after investigating library sizes, rarefaction curves and to maintain sufficient replication across all sites (Figure S17), we utilized raw read counts, proportions, centered log-ratio, or Hellinger transformations on the data as appropriate when performing statistics and generating visualizations.…”
Section: Downloaded Frommentioning
confidence: 99%
“…Controls and low read count samples (only one sample, DSUK182) were removed, followed by rarefaction (without replacement) to 90% of the minimum sample read count (7,826 reads per sample) to normalise the library size across the samples. DNA sequencing data consist of discrete counts of sequence reads and the total number of which is the library size (Cameron et al, 2020). Library sizes can vary greatly between samples and thus the samples were normalised to remove bias and false inferences due to variations in library size.…”
Section: Sequencing and Data Processingmentioning
confidence: 99%
“…Because the extent to which zeros compromised accurate estimation waned with increasing library size (Figure 1), a similar analysis was performed on amplicon sequencing data for six water samples from lakes. The samples (Cameron et al, 2020) featured library sizes between 10,000 and 30,000 and observation of 1,142 unique variants among the samples. All singleton counts had been zeroed and the completed ASV table had 3,342 rows (2,200 of which are all zeros associated with variants detected in other samples from the same study area).…”
Section: Probabilistic Inference Of Source Shannon Index Using Bayesian Methodsmentioning
confidence: 99%
“…Inference about source diversity is the ideal, but it is not possible with a multinomial relative abundance model unless the number of unique variants in the source is precisely known and there are many types of error in amplicon sequencing that are likely to invalidate this foundational model as discussed above. Rarefying repeatedly, a subsampling process to normalize library sizes among samples that is performed many times in order to characterize the variability introduced by rarefying (Cameron et al, 2020), satisfies these goals. When a sample is rarefied repeatedly down to a smaller library size (using sampling without replacement), it describes what data might have been obtained if only the smaller library size of sequence variants had been observed.…”
Section: Diversity Analysis In Absence Of a Model To Infer Source Diversitymentioning
confidence: 99%