Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating effects of caffeine. We generated a high-quality draft genome of the species Coffea canephora, which displays a conserved chromosomal gene order among asterid angiosperms. Although it shows no sign of the whole-genome triplication identified in Solanaceae species such as tomato, the genome includes several species-specific gene family expansions, among them N-methyltransferases (NMTs) involved in caffeine production, defense-related genes, and alkaloid and flavonoid enzymes involved in secondary compound synthesis. Comparative analyses of caffeine NMTs demonstrate that these genes expanded through sequential tandem duplications independently of genes from cacao and tea, suggesting that caffeine in eudicots is of polyphyletic origin. (Résumé d'auteur
Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly 1 . The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE) 2 . Here we report the whole-genome sequencing and assembly of the desiccationtolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.The genomes of Arabidopsis 3 , rice 4 , poplar, grape and Sorghum 5 were first sequenced using high-quality and reiterative Sanger-based approaches producing a series of 'gold standard' reference genomes. The advent of next-generation sequencing (NGS) technologies reduced costs of sequencing substantially, which has enabled sequencing of over 100 plant genomes 1 . The quality of plant genome assemblies depends on genome size, ploidy, heterozygosity and sequence coverage, but most NGS-based genomes have on the order of tens of thousands of short contigs distributed in thousands of scaffolds. The short read lengths of NGS, inherent biases and non-random sequencing errors have resulted in highly fragmented draft genome assemblies that are not complete, which means they are missing biologically meaningful sequences including entire genes, regulatory regions, transposable elements, centromeres, telomeres and haplotype-specific structural variations. It is becoming clear from ENCODE projects that complete genomes are needed to better understand the importance of the non-coding regions of genomes 2 .More than 40% of calories consumed by humans are derived from grasses, and the grass family (Poaceae) is arguably the most important plant family with regard to global food security 6 . The size and complexity of most grass genomes has challenged progress in gene discovery and comparative genomics, although draft genomes are now available for most agriculturally important grasses 1 . The largest genome assemblies, such as maize (2,300 megabases (Mb)) 7 , barley (5,100 Mb) 8 and wheat (hexaploid, 1...
BackgroundHighbush blueberry (Vaccinium corymbosum) has long been consumed for its unique flavor and composition of health-promoting phytonutrients. However, breeding efforts to improve fruit quality in blueberry have been greatly hampered by the lack of adequate genomic resources and a limited understanding of the underlying genetics encoding key traits. The genome of highbush blueberry has been particularly challenging to assemble due, in large part, to its polyploid nature and genome size.FindingsHere, we present a chromosome-scale and haplotype-phased genome assembly of the cultivar “Draper,” which has the highest antioxidant levels among a diversity panel of 71 cultivars and 13 wild Vaccinium species. We leveraged this genome, combined with gene expression and metabolite data measured across fruit development, to identify candidate genes involved in the biosynthesis of important phytonutrients among other metabolites associated with superior fruit quality. Genome-wide analyses revealed that both polyploidy and tandem gene duplications modified various pathways involved in the biosynthesis of key phytonutrients. Furthermore, gene expression analyses hint at the presence of a spatial-temporal specific dominantly expressed subgenome including during fruit development.ConclusionsThese findings and the reference genome will serve as a valuable resource to guide future genome-enabled breeding of important agronomic traits in highbush blueberry.
These authors contributed equally to this work. SUMMARYBlack raspberry (Rubus occidentalis) is an important specialty fruit crop in the US Pacific Northwest that can hybridize with the globally commercialized red raspberry (R. idaeus). Here we report a 243 Mb draft genome of black raspberry that will serve as a useful reference for the Rosaceae and Rubus fruit crops (raspberry, blackberry, and their hybrids). The black raspberry genome is largely collinear to the diploid woodland strawberry (Fragaria vesca) with a conserved karyotype and few notable structural rearrangements. Centromeric satellite repeats are widely dispersed across the black raspberry genome, in contrast to the tight association with the centromere observed in most plants. Among the 28 005 predicted protein-coding genes, we identified 290 very recent small-scale gene duplicates enriched for sugar metabolism, fruit development, and anthocyanin related genes which may be related to key agronomic traits during black raspberry domestication. This contrasts patterns of recent duplications in the wild woodland strawberry F. vesca, which show no patterns of enrichment, suggesting gene duplications contributed to domestication traits. Expression profiles from a fruit ripening series and roots exposed to Verticillium dahliae shed insight into fruit development and disease response, respectively. The resources presented here will expedite the development of improved black and red raspberry, blackberry and other Rubus cultivars.
Plant genome size varies by four orders of magnitude, and most of this variation stems from dynamic changes in repetitive DNA content. Here we report the small 109 Mb genome of Selaginella lepidophylla, a clubmoss with extreme desiccation tolerance. Single-molecule sequencing enables accurate haplotype assembly of a single heterozygous S. lepidophylla plant, revealing extensive structural variation. We observe numerous haplotype-specific deletions consisting of largely repetitive and heavily methylated sequences, with enrichment in young Gypsy LTR retrotransposons. Such elements are active but rapidly deleted, suggesting “bloat and purge” to maintain a small genome size. Unlike all other land plant lineages, Selaginella has no evidence of a whole-genome duplication event in its evolutionary history, but instead shows unique tandem gene duplication patterns reflecting adaptation to extreme drying. Gene expression changes during desiccation in S. lepidophylla mirror patterns observed across angiosperm resurrection plants.
BackgroundThe fragmented nature of most draft plant genomes has hindered downstream gene discovery, trait mapping for breeding, and other functional genomics applications. There is a pressing need to improve or finish draft plant genome assemblies.FindingsHere, we present a chromosome-scale assembly of the black raspberry genome using single-molecule real-time Pacific Biosciences sequencing and high-throughput chromatin conformation capture (Hi-C) genome scaffolding. The updated V3 assembly has a contig N50 of 5.1 Mb, representing an ∼200-fold improvement over the previous Illumina-based version. Each of the 235 contigs was anchored and oriented into seven chromosomes, correcting several major misassemblies. Black raspberry V3 contains 47 Mb of new sequences including large pericentromeric regions and thousands of previously unannotated protein-coding genes. Among the new genes are hundreds of expanded tandem gene arrays that were collapsed in the Illumina-based assembly. Detailed comparative genomics with the high-quality V4 woodland strawberry genome (Fragaria vesca) revealed near-perfect 1:1 synteny with dramatic divergence in tandem gene array composition. Lineage-specific tandem gene arrays in black raspberry are related to agronomic traits such as disease resistance and secondary metabolite biosynthesis.ConclusionsThe improved resolution of tandem gene arrays highlights the need to reassemble these highly complex and biologically important regions in draft plant genomes. The updated, high-quality black raspberry reference genome will be useful for comparative genomics across the horticulturally important Rosaceae family and enable the development of marker assisted breeding in Rubus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.