2014
DOI: 10.1093/bioinformatics/btt771
|View full text |Cite
|
Sign up to set email alerts
|

Bias from removing read duplication in ultra-deep sequencing experiments

Abstract: A Python implementation is freely available at https://bitbucket.org/wanding/duprecover/overview CONTACT: : wzhou1@mdanderson.org, kchen3@mdanderson.org Supplementary information: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 39 publications
(31 citation statements)
references
References 16 publications
0
31
0
Order By: Relevance
“…It is desirable to informatically remove artificial duplicates prior to any count or mapping-based analysis to obtain more accurate estimates of true population sizes or allele frequencies. Artificial and samplinginduced duplicates cannot be informatically separated, although the use of both mate-paired reads in the dereplication process (as performed here) can significantly improve the likelihood of removing artificial rather than sampling-induced duplicates (Zhou et al, 2014). Moreover, due to the high genomic complexity of metagenomic samples, the frequency of sampling-induced duplicates is likely to be quite low (Gomez-Alvarez et al, 2009).…”
Section: Effects Of Library Preparation On Metagenomic Library Qualitymentioning
confidence: 99%
See 1 more Smart Citation
“…It is desirable to informatically remove artificial duplicates prior to any count or mapping-based analysis to obtain more accurate estimates of true population sizes or allele frequencies. Artificial and samplinginduced duplicates cannot be informatically separated, although the use of both mate-paired reads in the dereplication process (as performed here) can significantly improve the likelihood of removing artificial rather than sampling-induced duplicates (Zhou et al, 2014). Moreover, due to the high genomic complexity of metagenomic samples, the frequency of sampling-induced duplicates is likely to be quite low (Gomez-Alvarez et al, 2009).…”
Section: Effects Of Library Preparation On Metagenomic Library Qualitymentioning
confidence: 99%
“…2). Artificial and samplinginduced duplicates cannot be informatically separated, although the use of both mate-paired reads in the dereplication process (as performed here) can significantly improve the likelihood of removing artificial rather than sampling-induced duplicates (Zhou et al, 2014). Duplicated reads can arise from two sources: overamplification of a limited pool of input molecules ('artificial') or fragmentation of two identical genomic molecules in exactly the same place during library preparation ('sampling induced').…”
Section: Effects Of Library Preparation On Metagenomic Library Qualitymentioning
confidence: 99%
“…Exome sequencing studies at moderate (approximately 100X) depth rely on read position to identify potential PCR duplicates [17], but amplicon-based (molecular inversion probes [18], RainDrop Digital PCR (RainDance Technologies, Billerica, MA, USA), TruSeq Custom Amplicon (Illumina, San Diego, CA, USA)) methods commonly used for targeted sequencing have reads with the same start and stop positions. Hybridization-based methods, when sequenced deeply, can result in reads that are not PCR duplicates but have the same start stop locations by chance [19]. PCR-free methods are also available, but typically require higher amounts of DNA input (1 to 2 ug), limiting their use in cancer studies.…”
Section: Introductionmentioning
confidence: 99%
“…T200·2 target‐capture deep‐sequencing data were aligned to human hg19 using BWA and duplicate reads were removed using Picard . A proprietary custom analysis pipeline classified somatic variants on the basis of variant allele frequencies in tumour and matched normal tissue. Functional consequences of somatic variants were assessed by comparison with the dbSNP, COSMIC and TCGA databases and then annotated using VEP, Annovar and CanDrA …”
Section: Methodsmentioning
confidence: 99%