2013
DOI: 10.1038/nmeth.2375
|View full text |Cite
|
Sign up to set email alerts
|

Predicting the molecular complexity of sequencing libraries

Abstract: Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of genome sequencing. Available methods to determine either how deeply to sequence, or predict the benefits of additional sequencing, are almost completely lacking. We introduce an empirical Bayesian method to implicitly model any source of bias and accurately characterize the molecular complexity of a DNA sample or library in almost any sequencing application.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
309
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 284 publications
(316 citation statements)
references
References 15 publications
4
309
0
Order By: Relevance
“…1a). As we are comparing libraries with different sequencing depths, we used the PreSeq package 15 to extrapolate and compare the potential complexity of our libraries (Fig. 1b).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…1a). As we are comparing libraries with different sequencing depths, we used the PreSeq package 15 to extrapolate and compare the potential complexity of our libraries (Fig. 1b).…”
Section: Resultsmentioning
confidence: 99%
“…For mined data sets using short, single-end reads, reads were extended to 300 bp before generating RPKM values. Potential library complexity was determined using the extrapolate function of the PreSeq package 15 . For expression analysis, normalization of RNA-seq read enrichment was calculated as RPKM at exonic regions only (RefSeq transcripts).…”
Section: Methodsmentioning
confidence: 99%
“…A higher value of the overdispersion parameter r (40.05 based on our experience) can be used as an indicator of poor library complexity. Another method for characterizing the molecular complexity of sequencing library is an empirical Bayesian method implemented in preseq software (Daley and Smith, 2013).…”
Section: Estimation Of the Dae Ratio At Informative Snp Positions Andmentioning
confidence: 99%
“…This method relies on the observation that the curve of rarefied counts of any feature (for example, operational taxonomic units, named species, predicted genes, functional categories or even short motifs) should plateau if the sample is close to saturation. Use of rarefaction curves in microbial community studies was popularized by tools such as mothur (Schloss et al, 2009) and recently extended to include accurate projections at higher sequencing efforts by preseq (Daley and Smith, 2013), which allows the estimation of coverage across features (arithmetic mean). However, this technique and others like it typically rely on a high-quality assembly, comprehensive reference data sets or both, which are often unavailable for complex or poorly characterized communities (with the probable exception of ribosomal RNA (rRNA) genes).…”
mentioning
confidence: 99%