BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
Discovery of rare mutations in populations requires methods, such as TILLING (for Targeting Induced Local Lesions in Genomes), for processing and analyzing many individuals in parallel. Previous TILLING protocols employed enzymatic or physical discrimination of heteroduplexed from homoduplexed target DNA. Using mutant populations of rice (Oryza sativa) and wheat (Triticum durum), we developed a method based on Illumina sequencing of target genes amplified from multidimensionally pooled templates representing 768 individuals per experiment. Parallel processing of sequencing libraries was aided by unique tracer sequences and barcodes allowing flexibility in the number and pooling arrangement of targeted genes, species, and pooling scheme. Sequencing reads were processed and aligned to the reference to identify possible single-nucleotide changes, which were then evaluated for frequency, sequencing quality, intersection pattern in pools, and statistical relevance to produce a Bayesian score with an associated confidence threshold. Discovery was robust both in rice and wheat using either bidimensional or tridimensional pooling schemes. The method compared favorably with other molecular and computational approaches, providing high sensitivity and specificity.
Huanglongbing (HLB) or “citrus greening” is the most destructive citrus disease worldwide. In this work, we studied host responses of citrus to infection with Candidatus Liberibacter asiaticus (CaLas) using next-generation sequencing technologies. A deep mRNA profile was obtained from peel of healthy and HLB-affected fruit. It was followed by pathway and protein-protein network analysis and quantitative real time PCR analysis of highly regulated genes. We identified differentially regulated pathways and constructed networks that provide a deep insight into the metabolism of affected fruit. Data mining revealed that HLB enhanced transcription of genes involved in the light reactions of photosynthesis and in ATP synthesis. Activation of protein degradation and misfolding processes were observed at the transcriptomic level. Transcripts for heat shock proteins were down-regulated at all disease stages, resulting in further protein misfolding. HLB strongly affected pathways involved in source-sink communication, including sucrose and starch metabolism and hormone synthesis and signaling. Transcription of several genes involved in the synthesis and signal transduction of cytokinins and gibberellins was repressed while that of genes involved in ethylene pathways was induced. CaLas infection triggered a response via both the salicylic acid and jasmonic acid pathways and increased the transcript abundance of several members of the WRKY family of transcription factors. Findings focused on the fruit provide valuable insight to understanding the mechanisms of the HLB-induced fruit disorder and eventually developing methods based on small molecule applications to mitigate its devastating effects on fruit production.
Previous work using glass microneedles to apply calibrated, localized force to neurons showed that tensile force is a sufficient signal for neurite initiation and elongation. However, previous studies did not examine the kinetics or probability of neurite initiation as a function of force or the rate of force application. Here we report the use of a new technique-magnetic bead force application-to systematically investigate the role of force in these phenomena with better ease of use and control over force than glass microneedles. Force-induced neurite initiation from embryonic chick forebrain neurons appeared to be a first-order random process whose rate increased with increasing force, and required the presence of peripheral microtubules. In addition, the probability of initiation was more than twofold lower for neurons exposed to rapid initial force ramps (450 pN/s) than for neurons exposed to slower ramps (1.5 and 11 pN/s). We observed a low force threshold for elongation (15-100 pN), which was not previously detected in chick forebrain neurites elongated by glass microneedles. Finally, neurites subjected to constant force elongated at variable instantaneous rates, and switched abruptly between elongation and retraction, similar to spontaneous, growth-cone-mediated outgrowth and microtubule dynamic instability.
Genetically programmed DNA rearrangements can regulate mRNA expression at an individual locus or, for some organisms, on a genome-wide scale. Ciliates rely on a remarkable process of whole-genome remodeling by DNA elimination to differentiate an expressed macronucleus (MAC) from a copy of the germline micronucleus (MIC) in each cycle of sexual reproduction. Here we describe results from the first high-throughput sequencing effort to investigate ciliate genome restructuring, comparing Sanger long-read sequences from a Tetrahymena thermophila MIC genome library to the MAC genome assembly. With almost 25% coverage of the unique-sequence MAC genome by MIC genome sequence reads, we created a resource for positional analysis of MIC-specific DNA removal that pinpoints MAC genome sites of DNA elimination at nucleotide resolution. The widespread distribution of internal eliminated sequences (IES) in promoter regions and introns suggests that MAC genome restructuring is essential not only for what it removes (for example, active transposons) but also for what it creates (for example, splicing-competent introns). Consistent with the heterogeneous boundaries and epigenetically modulated efficiency of individual IES deletions studied to date, we find that IES sites are dramatically under-represented in the ∼25% of the MAC genome encoding exons. As an exception to this general rule, we discovered a previously unknown class of small (<500 bp) IES with precise elimination boundaries that can contribute the 3′ exon of an mRNA expressed during genome restructuring, providing a new mechanism for expanding mRNA complexity in a developmentally regulated manner.
BackgroundThe application of next generation sequencing technologies and bioinformatic scripts to identify high frequency SNPs distributed throughout the peach genome is described. Three peach genomes were sequenced using Roche 454 and Illumina/Solexa technologies to obtain long contigs for alignment to the draft 'Lovell' peach sequence as well as sufficient depth of coverage for 'in silico' SNP discovery.DescriptionThe sequences were aligned to the 'Lovell' peach genome released April 01, 2010 by the International Peach Genome Initiative (IPGI). 'Dr. Davis', 'F8, 1-42' and 'Georgia Belle' were sequenced to add SNPs segregating in two breeding populations, Pop DF ('Dr. Davis' × 'F8, 1-42') and Pop DG ('Dr. Davis' × 'Georgia Belle'). Roche 454 sequencing produced 980,000 total reads with 236 Mb sequence for 'Dr. Davis' and 735,000 total reads with 172 Mb sequence for 'F8, 1-42'. 84 bp × 84 bp paired end Illumina/Solexa sequences yielded 25.5, 21.4, 25.5 million sequences for 'Dr. Davis', 'F8, 1-42' and 'Georgia Belle', respectively. BWA/SAMtools were used for alignment of raw reads and SNP detection, with custom PERL scripts for SNP filtering. Velvet's Columbus module was used for sequence assembly. Comparison of aligned and overlapping sequences from both Roche 454 and Illumina/Solexa resulted in the selection of 6654 high quality SNPs for 'Dr. Davis' vs. 'F8, 1-42' and 'Georgia Belle', distributed on eight major peach genome scaffolds as defined from the 'Lovell' assembly.ConclusionThe eight scaffolds contained about 215-225 Mb of peach genomic sequences with one SNP/~ 40,000 bases. All sequences from Roche 454 and Illumina/Solexa have been submitted to NCBI for public use in the Short Read Archive database. SNPs have been deposited in the NCBI SNP database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.