2022
DOI: 10.1101/2022.01.31.478554
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Human “Contaminome”: Bacterial, Viral, and Computational Contamination in Whole Genome Sequences from 1,000 Families

Abstract: BackgroundThe unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1,000 families and 5,000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies.ResultsWe present several notable results: (1) In addition to known contaminan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
9
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 57 publications
(67 reference statements)
1
9
0
Order By: Relevance
“…2C). We demonstrated previously that this strategy can detect viral experimental and computational contamination in WGS [8]. Similarly to our previous findings, viruses associated with sample cell type or sequencing plate are likely contaminants or reagents used in the immortalization, storage, or sequencing pipelines.…”
Section: Unmapped Read Space Characterizes Prevalence and Abundance O...supporting
confidence: 83%
See 2 more Smart Citations
“…2C). We demonstrated previously that this strategy can detect viral experimental and computational contamination in WGS [8]. Similarly to our previous findings, viruses associated with sample cell type or sequencing plate are likely contaminants or reagents used in the immortalization, storage, or sequencing pipelines.…”
Section: Unmapped Read Space Characterizes Prevalence and Abundance O...supporting
confidence: 83%
“…As described in our previous work [8], one reason for biological source-dependent abundance is that different reagents and pipelines are used in LCL versus whole blood prep and storage pipelines and may lead to different contamination profiles. This is probably the case for EBV (gamma herpes 4) enriched in LCLs (acquired during the EBV-induced immortalization step) and its relatives, as well as the non-herpesviruses enriched in whole blood samples.…”
Section: Unmapped Read Space Characterizes Prevalence and Abundance O...mentioning
confidence: 99%
See 1 more Smart Citation
“…On a population-scale, we therefore understand much less about the role these regions play in health and disease, and the amount of genetic diversity present in these regions [16]. Reads originating from these regions - as well as from viruses and bacteria in hosts or NGS reagents [6, 5] - collectively make up the unmapped read space.…”
Section: Introductionmentioning
confidence: 99%
“…A variety of sources can introduce microbial contamination. External sources include personnel, the laboratory environment, and kits and reagents used for collecting and processing samples 2,3,[11][12][13][14][15][16][17][18][19][20] . Internal sources of contamination may include human error, such as sample mislabeling or inadvertent mixing 3,11,17,21 .…”
mentioning
confidence: 99%