2012
DOI: 10.1093/nar/gks001
|View full text |Cite
|
Sign up to set email alerts
|

Summarizing and correcting the GC content bias in high-throughput sequencing

Abstract: GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. We analyze regularities in the GC bias patterns, and find a compact description for this unimodal … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

24
714
5

Year Published

2013
2013
2021
2021

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 813 publications
(758 citation statements)
references
References 21 publications
24
714
5
Order By: Relevance
“…The aligned read numbers were counted across the whole genome with 200 bp sliding windows and 100 bp slide steps using an in-house Perl script. The GC bias of the Illumina platform was corrected using LOESS smoothing toward a pattern of uniform coverage at all GC percentages, as previously described 54 . CNV candidate windows were initially defined as having five out of seven or more sequential 200 bp overlapping windows with read depth values that differed significantly from the whole-genome average depth (>Mean + 2 × Stdev).…”
Section: Methodsmentioning
confidence: 99%
“…The aligned read numbers were counted across the whole genome with 200 bp sliding windows and 100 bp slide steps using an in-house Perl script. The GC bias of the Illumina platform was corrected using LOESS smoothing toward a pattern of uniform coverage at all GC percentages, as previously described 54 . CNV candidate windows were initially defined as having five out of seven or more sequential 200 bp overlapping windows with read depth values that differed significantly from the whole-genome average depth (>Mean + 2 × Stdev).…”
Section: Methodsmentioning
confidence: 99%
“…The variation is largely due to variation in sequencing depth and PstI site occurrence, which are both related to GC content (Fig. S3; Benjamini and Speed 2012; see Supporting Information note for full discussion). However, SNP density is not correlated with recombination rate, final map lengths, or crossover resolution, and final map lengths are consistent across all crosses (see below), so we do not believe variation in SNP density has affected our results (see Supporting Information note for further details).…”
Section: Resultsmentioning
confidence: 99%
“…The resulting Log 2 (raw copy number ratios) are corrected for the GC content in each amplicon, as previously described (Supplemental Figure S2B), 19 to adjust for the observation that GC-and AT-rich fragments are underrepresented in sequencing due to the unimodal effect that GC content has on DNA melting temperature. This correction is standard for most sequencing-based CNA detection approaches.…”
Section: An Algorithm For Detecting Cnas In Targeted Amplicon-based Nmentioning
confidence: 99%