2020
DOI: 10.1186/s12915-020-0748-z
|View full text |Cite
|
Sign up to set email alerts
|

Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability

Abstract: Background: Contaminant DNA is a well-known confounding factor in molecular biology and in genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data commonly do not account for errors potentially introduced by contamination, which could lead to the wrong assessment of allele frequency both in basic and clinical research. Results: We used a taxonomic filter to remove contaminant reads from more than 4000 bacterial samples from 20 different studies and performed a comprehensive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
38
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 55 publications
(43 citation statements)
references
References 62 publications
5
38
0
Order By: Relevance
“…Other groups have identified methods to reduce additional sources of error in genomic epidemiology studies. For example, taxonomic filtering can importantly exclude reads from contaminating microbial species [49]. Additionally, other work has found that calling variants for samples independently rather than jointly may improve sensitivity for detecting low-frequency microbial variants [50].…”
Section: Discussionmentioning
confidence: 99%
“…Other groups have identified methods to reduce additional sources of error in genomic epidemiology studies. For example, taxonomic filtering can importantly exclude reads from contaminating microbial species [49]. Additionally, other work has found that calling variants for samples independently rather than jointly may improve sensitivity for detecting low-frequency microbial variants [50].…”
Section: Discussionmentioning
confidence: 99%
“…Given the potential presence of contaminant DNA not corresponding to MTBC, the Kraken software V2 13 was first used to classify the WGS reads. Further focus was directed only at those reads that belonged to MTBC species 14 . The WGS analysis, including mapping and variant calling (SNP and INDELS), was performed following a previously reported pipeline 7 , 15 , which has been described, validated and available online at http://tgu.ibv.csic.es/?page_id=1794 .…”
Section: Methodsmentioning
confidence: 99%
“…DNA that is not the focus of the study). While this is an important consideration in single genome studies (Goig et al 2020), it represents a particular challenge in mWGS, where host DNA likely to be highly prevalent in samples and hence constitute a significant fraction of the sequenced reads.…”
Section: Matching Sequences To Reference Databasesmentioning
confidence: 99%