Supplementary data are available at Bioinformatics online.
Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water use efficiency, and the second most important tropical fruit after banana in terms of international trade. We sequenced the genomes of pineapple varieties ‘F153’ and ‘MD2’, and a wild pineapple relative A. bracteatus accession CB5. The pineapple genome has one fewer ancient whole genome duplications than sequenced grass genomes and, therefore, provides an important reference for elucidating gene content and structure in the last common ancestor of extant members of the grass family (Poaceae). Pineapple has a conserved karyotype with seven pre rho duplication chromosomes that are ancestral to extant grass karyotypes. The pineapple lineage has transitioned from C3 photosynthesis to CAM with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues using beta-carbonic anhydrase (βCA) for initial capture of CO2. Promoter regions of all three βCA genes contain a CCA1 binding site that can bind circadian core oscillators. CAM pathway genes were enriched with cis-regulatory elements including the morning (CCACAC) and evening (AAAATATC) elements associated with regulation of circadian-clock genes, providing the first link between CAM and the circadian clock regulation. Gene-interaction network analysis revealed both activation and repression of regulatory elements that control key enzymes in CAM photosynthesis, indicating that CAM evolved by reconfiguration of pathways preexisting in C3 plants. Pineapple CAM photosynthesis is the result of regulatory neofunctionalization of preexisting gene copies and not acquisition of neofunctionalized genes via whole genome or tandem gene duplication.
Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.
When defining bacterial populations through whole genome sequencing (WGS) the samples often have unknown evolutionary histories. With the increased use of next generation WGS in routine diagnostics, surveillance and epidemiology a vast amount of short read data is available, with phylogenetic trees (dendograms) used to visualise the relationships and similarities between samples. Standard reference and assembly based methods can take substantial amounts of time to generate these phylogenetic relationships, with the computation time often exceeding the time to sequence the samples in the first place. Faster methods (Ondov et al. 2016; Wood and Salzberg 2014) can loosely classify samples into known taxonomic categories, however the loss of granularity means the relationships between samples is reduced. This can be the difference between ruling a sample in or out of an outbreak, which is a clinically important finding for genomic epidemiologists. Other methods (Boratyn et al. 2014) are closed source which prevents independent scrutiny. Saf-fronTree utilises the k-mer profiles between samples to rapidly construct a tree, directly from raw reads in FASTQ format or contigs in FASTA format. It supports NGS data (such as Illumina), 3rd generation long read data (Pacbio/Nanopore) and assembled sequences (FASTA). Firstly, a k-mer count database is constructed for each sample using KMC (Kokot, Długosz, and Deorowicz 2017). Next, the intersection of the k-mer databases is found for each pair of samples, with the number of k-mers in common recorded in a distance matrix. Finally, the distance matrix is used to construct a UPGMA tree (Sokal and Michener 1958) in Newick format. This tree method was chosen as it is fast, however the final result is lower quality than slower methods which perform ancestral sequence reconstructions (Stamatakis 2014). The computational complexity of the algorithm is O(Nˆ2), so is best suited to datasets of less than 50 samples. This can give rapid insights into small datasets in minutes, rather than hours. SaffonTree provides better granular-ity than MLST as it uses more of the underlying genome, can operate at low depth of coverage, is reference free, species agnostic, and has a low memory requirement.
Background: The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate.Results: Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the 'pan-genome' of three divergent rice varieties and document several megabases of each genome absent in the other two.Conclusions: Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species. BackgroundRice (Oryza sativa) provides 20% of the world's dietary energy supply and is the predominant staple food for 17 countries in Asia, 9 countries in North and South America and 8 countries in Africa. Within O. sativa, there are two major varietal groups, Indica and Japonica, that can be further subdivided into five major subpopulations: indica and aus share ancestry within the Indica varietal group, and tropical japonica, temperate japonica and aromatic (Group V) share ancestry within the Japonica varietal group (Figure 1 The time since divergence of the ancestral Indica and Japonica gene pools is estimated at 0.44 million years, based on sequence comparisons between cv Nipponbare (Japonica) and cv . This time estimate pre-dates the domestication of O. sativa by several hundred thousand years, suggesting that rice cultivation proceeded from multiple, pre-differentiated ancestral pools [1,[9][10][11][12][13]. This is consistent with genome-wide estimates of divergence based on gene content [14], transcript levels [15], single nucleotide polymorphisms (SNPs) [3,16], and
The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cellfate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ∼75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50 = 222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.F latworms belong to the superphylum Lophotrochozoa, a vast assembly of protostome invertebrates (1, 2) (Fig. 1A). The evolutionary relationships within this clade are poorly resolved and the specific position of flatworms is currently debated (3, 4). Flatworms have attracted scientific attention for centuries because of their astonishing regenerative capabilities (5, 6), as well as their ability to "degrow" in a controlled way when starved (7). As far back as the early 1900s, Thomas Morgan recognized the potential of flatworms and conducted a number of fascinating regeneration experiments on planarian flatworms before his focus shifted to Drosophila genetics (8).Macrostomum lignano is (Fig. 1B), a free-living, regenerating flatworm isolated from the coast of the Mediterranean Sea. M. lignano is an obligatorily cross-fertilizing simultaneous hermaphrodite (9) that belongs to Macrostomorpha, whereas the other often-studied freeliving flatworms and human parasitic flatworms all belong to clades that are potentially more derived (less ancestral) in comparison with Macrostomorpha (2) (Fig. 1C).Many flatworms can regenerate nearly their entire body or amputated organs. This regenerative capacity is thought to be attributable to the presence of somatic stem cells, termed neoblasts (10, 11). In Schmidtea mediterranea (planarian flatworm), even a single transplanted neoblast has the ability to rescue, regenerate, and change the genotype of a fatally irradiated worm (12). M. lignano can regenerate every tissue, with the exception of the head region containing the brain (13,14).Neoblasts in M. lignano ( Fig. 1 D and E), in contrast to most vertebrate somatic stem cells, are plentiful, making up about ...
The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important oncogene (also known as), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.