Non-random DNA fragmentation in next-generation sequencing

Poptsova, Maria; Il’icheva, I. A.; Nechipurenko, Dmitry; Panchenko, L. A.; Khodikov, Mingian V.; Oparina, N. Yu.; Polozov, R. V.; Nechipurenko, Yu. D.; Grokhovsky, S. L.

doi:10.1038/srep04532

Cited by 108 publications

(85 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DNA fragmentation (Poptsova et al, 2014) and PCR biases (Benjamini and Speed, 2012) introduced during library preparation result in a non-uniform sampling of possible sequencing reads and an under-representation of DNA with certain sequence features. Benjamini and Speed found that genomic fragments with high and low GC content are under-represented in Illumina libraries (Benjamini and Speed, 2012), and Manor and Borenstein found that intra-metagenome differences in coverage of different universal, single-copy genes can be explained by their GC content (Manor and Borenstein, 2015).…”

Section: Experimental Protocols Affect Results and Should Be Tracked mentioning

confidence: 99%

Toward Accurate and Quantitative Comparative Metagenomics

Nayfach

Pollard

2016

Cell

252

215

View full text Add to dashboard Cite

Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized.

show abstract

Section: Experimental Protocols Affect Results and Should Be Tracked mentioning

confidence: 99%

Toward Accurate and Quantitative Comparative Metagenomics

Nayfach

Pollard

2016

Cell

252

215

View full text Add to dashboard Cite

show abstract

“…and the methods used for data analysis (e.g., peak calling). Third, ChIP-seq data contain numerous technical biases (Kidder et al, 2011), including formaldehyde crosslinking bias (Solomon and Varshavsky, 1985; Lu et al, 2010; Gavrilov et al, 2015), antibody specificity and variability problems (Parseghian, 2013; Schonbrunn, 2014; Wardle and Tan, 2015) (Figure S15), technical artifacts due to highly expressed regions of the genome (which are not corrected by regular input controls) (Teytelman et al, 2013; Park et al, 2013; Jain et al, 2015), bias due to genome fragmentation and PCR amplification (Bardet et al, 2011; Poptsova et al, 2014), etc. These biases can lead to false-positive and false-negative peaks, and they also significantly affect any quantitative estimates of in vivo TF binding levels derived from ChIP-seq data, in ways that we do not understand well enough to correct (Gavrilov et al, 2015).…”

Section: Resultsmentioning

confidence: 99%

Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding

et al. 2018

View full text Add to dashboard Cite

SUMMARY Paralogous transcription factors (TFs) are oftentimes reported to have identical DNA-binding motifs, despite the fact that they perform distinct regulatory functions. Differential genomic targeting by paralogous TFs is generally assumed to be due to interactions with protein co-factors or the chromatin environment. Using a computational-experimental framework called iMADS (integrative modeling and analysis of differential specificity), we show that, contrary to previous assumptions, paralogous TFs bind differently to genomic target sites even in vitro. We used iMADS to quantify, model, and analyze specificity differences between 11 TFs from 4 protein families. We found that paralogous TFs have diverged mainly at mediumand low-affinity sites, which are poorly captured by current motif models. We identify sequence and shape features differentially preferred by paralogous TFs, and we show that the intrinsic differences in specificity among paralogous TFs contribute to their differential in vivo binding. Thus, our study represents a step forward in deciphering the molecular mechanisms of differential specificity in TF families.

show abstract

“…As there are known or suspected biases in next-generation sequencing (NGS) library construction (see [38] for example), it is possible that a small portion of the genome is missing from the sequencing libraries constructed. Published genome size estimates, however, range from 0.83–1.37 Gb [26], implying scaffold coverage could range anywhere from 64–100% of the genome.…”

Section: Resultsmentioning

confidence: 99%

The Genome and Linkage Map of the Northern Pike (Esox lucius): Conserved Synteny Revealed between the Salmonid Sister Group and the Neoteleostei

et al. 2014

View full text Add to dashboard Cite

The northern pike is the most frequently studied member of the Esociformes, the closest order to the diverse and economically important Salmoniformes. The ancestor of all salmonids purportedly experienced a whole-genome duplication (WGD) event, making salmonid species ideal for studying the early impacts of genome duplication while complicating their use in wider analyses of teleost evolution. Studies suggest that the Esociformes diverged from the salmonid lineage prior to the WGD, supporting the use of northern pike as a pre-duplication outgroup. Here we present the first genome assembly, reference transcriptome and linkage map for northern pike, and evaluate the suitability of this species to provide a representative pre-duplication genome for future studies of salmonid and teleost evolution. The northern pike genome sequence is composed of 94,267 contigs (N50 = 16,909 bp) contained in 5,688 scaffolds (N50 = 700,535 bp); the total scaffolded genome size is 878 million bases. Multiple lines of evidence suggest that over 96% of the protein-coding genome is present in the genome assembly. The reference transcriptome was constructed from 13 tissues and contains 38,696 transcripts, which are accompanied by normalized expression data in all tissues. Gene-prediction analysis produced a total of 19,601 northern pike-specific gene models. The first-generation linkage map identifies 25 linkage groups, in agreement with northern pike's diploid karyotype of 2N = 50, and facilitates the placement of 46% of assembled bases onto linkage groups. Analyses reveal a high degree of conserved synteny between northern pike and other model teleost genomes. While conservation of gene order is limited to smaller syntenic blocks, the wider conservation of genome organization implies the northern pike exhibits a suitable approximation of a non-duplicated Protacanthopterygiian genome. This dataset will facilitate future studies of esocid biology and empower ongoing examinations of the Atlantic salmon and rainbow trout genomes by facilitating their comparison with other major teleost groups.

show abstract

Non-random DNA fragmentation in next-generation sequencing

Cited by 108 publications

References 40 publications

Toward Accurate and Quantitative Comparative Metagenomics

Toward Accurate and Quantitative Comparative Metagenomics

Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding

The Genome and Linkage Map of the Northern Pike (Esox lucius): Conserved Synteny Revealed between the Salmonid Sister Group and the Neoteleostei

Contact Info

Product

Resources

About