2017
DOI: 10.1101/221499
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Abstract: The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants -DNA sequences not truly present in the sample. Contaminants come from a variety of sources, including reagents.Appropriate laboratory practices can reduce contamination in MGS data, but do not eliminate it.Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package which implements a statistical classification procedure for identifying cont… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
95
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 122 publications
(99 citation statements)
references
References 62 publications
(105 reference statements)
0
95
0
1
Order By: Relevance
“…To determine the level of false positives related to laboratory processing, we used negative controls for the DNA isolation and PCR processes and sequenced these negative controls. Contaminating sequences are expected to be present at greater proportions in negative controls than in samples (Davis, Proctor, Holmes, Relman, & Callahan, ). To remove potentially contaminating sequence reads from samples, we defined a sequence removal threshold, the maximum number of reads attributed to a taxon in a negative control.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…To determine the level of false positives related to laboratory processing, we used negative controls for the DNA isolation and PCR processes and sequenced these negative controls. Contaminating sequences are expected to be present at greater proportions in negative controls than in samples (Davis, Proctor, Holmes, Relman, & Callahan, ). To remove potentially contaminating sequence reads from samples, we defined a sequence removal threshold, the maximum number of reads attributed to a taxon in a negative control.…”
Section: Discussionmentioning
confidence: 99%
“…We suggest that negative controls be used for each step in the metabarcoding process (DNA isolation, PCR) and these negative controls should be included in the pool for Illumina sequencing. The data from negative controls can be used to remove contaminating sequences from samples in silico (Davis et al., ). Third, in order to reduce false‐positive determinations, we suggest that researchers take precautions to reduce environmentally common windborne pollen by approaches such as bleaching surfaces, filtering incoming air, the use of a laminar flow hood and/or not conducting procedures during the time period when wind‐pollinated trees (such as Pinus ) are producing pollen.…”
Section: Discussionmentioning
confidence: 99%
“…The impact of contamination increases in samples with small amounts of true exogenous DNA and can swamp the signal from the host's microbiome (Lusk, ; Salter et al., ). Contamination can be assessed using negative controls (e.g., Davis, Proctor, Holmes, Relman, & Callahan, ). However, the data used in this study were initially produced with the sole focus on the host organism.…”
Section: Methodsmentioning
confidence: 99%
“…Because we did not sequence a negative extraction control, we used decontam's frequency-filtering technique, which assumes a negative linear relationship between initial DNA concentration and frequency of the potential contaminant (Davis et al, 2018). Because we did not sequence a negative extraction control, we used decontam's frequency-filtering technique, which assumes a negative linear relationship between initial DNA concentration and frequency of the potential contaminant (Davis et al, 2018).…”
Section: Co Nta M I N Ati O N Eli M I N Ati O N M E Th O Ds a N D R Ementioning
confidence: 99%