2018
DOI: 10.1101/403824
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability

Abstract: Contaminant DNA is a well-known confounding factor in molecular biology and in genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data usually neglect the errors introduced by potential contaminations. We performed a comprehensive evaluation of the extent and impact of contaminant DNA in WGS by analyzing more than 4,000 bacterial samples from 20 different studies. We found that contaminations are pervasive and can introduce large biases in variant analysis. We showed that th… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
24
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 19 publications
(25 citation statements)
references
References 46 publications
1
24
0
Order By: Relevance
“…Apply a taxonomic filter and select reads corresponding to the taxa of interest or exclude reads mapping to other taxa 65 .…”
Section: Discussionmentioning
confidence: 99%
“…Apply a taxonomic filter and select reads corresponding to the taxa of interest or exclude reads mapping to other taxa 65 .…”
Section: Discussionmentioning
confidence: 99%
“…Kraken: To filter out samples that may have been contaminated by foreign DNA during sample 540 preparation, we ran the trimmed reads for each longitudinal and replicate isolate through 541 Kraken2 (Wood and Salzberg 2014) against a database (Goig et al 2020) containing all of the 542 sequences of bacteria, archaea, virus, protozoa, plasmids and fungi in RefSeq (release 90) and 543 the human genome (GRCh38). We calculated the proportion reads that were taxonomically 544 classified under the Mycobacterium tuberculosis Complex (MTBC) for each isolate and 545 implemented a threshold of 95%.…”
Section: Mixed Lineage and Contamination Detection For Longitudinal Amentioning
confidence: 99%
“…We have demonstrated that the same conclusions can be robustly made from plate sweeps by using mGEMS. Additionally, since the pipeline relies on modelling pseudoalignments against reference sequences, mGEMS acts as quality control for sequencing reads from samples that inadvertently contain multiple lineages or contamination, which can disrupt downstream analyses like SNP calling (Goig et al 2020) . Our pipeline also significantly outperforms the current state-of-the-art in analysing sequencing data from closely related mixed samples, reaching accuracy levels likely constrained by technical variation in the sequencing data and limitations in assembling sequencing data with variable coverage.…”
Section: Discussionmentioning
confidence: 99%
“…mGEMS demonstrates the power of plate sweep sequencing in genomic epidemiology and enables a change in the currently dominant framework that confers multiple benefits over both whole-genome shotgun metagenomics and isolate sequencing. Studies of the population structures of opportunistic pathogens have revealed extensive strain-level within-host variation (Stoesser et al 2015;Golubchik et al 2013;Paterson et al 2015;Greenblum et al 2015;Brodrick et al 2017;Lieberman et al 2014) with adverse implications for transmission analyses relying solely on isolate sequencing (Worby et al 2014;Stoesser et al 2015) and longitudinal studies reporting the absence or re-emergence of strains in a host based on colony picks (Paterson et al 2015;Brodrick et al 2016Brodrick et al , 2017 . While whole-genome shotgun metagenomics solves these issues to some extent (Gu et al 2019;Forbes et al 2017) , the culture-free nature suffers from issues with both bacterial and host DNA contamination particularly affecting the sensitivity for detecting strains in low abundance (Whelan et al 2020;Ivy et al 2018;McArdle and Kaforou 2020;Salter et al 2014) .…”
Section: Discussionmentioning
confidence: 99%