Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly 1 . The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE) 2 . Here we report the whole-genome sequencing and assembly of the desiccationtolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.The genomes of Arabidopsis 3 , rice 4 , poplar, grape and Sorghum 5 were first sequenced using high-quality and reiterative Sanger-based approaches producing a series of 'gold standard' reference genomes. The advent of next-generation sequencing (NGS) technologies reduced costs of sequencing substantially, which has enabled sequencing of over 100 plant genomes 1 . The quality of plant genome assemblies depends on genome size, ploidy, heterozygosity and sequence coverage, but most NGS-based genomes have on the order of tens of thousands of short contigs distributed in thousands of scaffolds. The short read lengths of NGS, inherent biases and non-random sequencing errors have resulted in highly fragmented draft genome assemblies that are not complete, which means they are missing biologically meaningful sequences including entire genes, regulatory regions, transposable elements, centromeres, telomeres and haplotype-specific structural variations. It is becoming clear from ENCODE projects that complete genomes are needed to better understand the importance of the non-coding regions of genomes 2 .More than 40% of calories consumed by humans are derived from grasses, and the grass family (Poaceae) is arguably the most important plant family with regard to global food security 6 . The size and complexity of most grass genomes has challenged progress in gene discovery and comparative genomics, although draft genomes are now available for most agriculturally important grasses 1 . The largest genome assemblies, such as maize (2,300 megabases (Mb)) 7 , barley (5,100 Mb) 8 and wheat (hexaploid, 1...
These authors contributed equally to this work. SUMMARYBlack raspberry (Rubus occidentalis) is an important specialty fruit crop in the US Pacific Northwest that can hybridize with the globally commercialized red raspberry (R. idaeus). Here we report a 243 Mb draft genome of black raspberry that will serve as a useful reference for the Rosaceae and Rubus fruit crops (raspberry, blackberry, and their hybrids). The black raspberry genome is largely collinear to the diploid woodland strawberry (Fragaria vesca) with a conserved karyotype and few notable structural rearrangements. Centromeric satellite repeats are widely dispersed across the black raspberry genome, in contrast to the tight association with the centromere observed in most plants. Among the 28 005 predicted protein-coding genes, we identified 290 very recent small-scale gene duplicates enriched for sugar metabolism, fruit development, and anthocyanin related genes which may be related to key agronomic traits during black raspberry domestication. This contrasts patterns of recent duplications in the wild woodland strawberry F. vesca, which show no patterns of enrichment, suggesting gene duplications contributed to domestication traits. Expression profiles from a fruit ripening series and roots exposed to Verticillium dahliae shed insight into fruit development and disease response, respectively. The resources presented here will expedite the development of improved black and red raspberry, blackberry and other Rubus cultivars.
SUMMARYBrachypodium distachyon is small annual grass that has been adopted as a model for the grasses. Its small genome, high-quality reference genome, large germplasm collection, and selfing nature make it an excellent subject for studies of natural variation. We sequenced six divergent lines to identify a comprehensive set of polymorphisms and analyze their distribution and concordance with gene expression. Multiple methods and controls were utilized to identify polymorphisms and validate their quality. mRNA-Seq experiments under control and simulated drought-stress conditions, identified 300 genes with a genotype-dependent treatment response. We showed that large-scale sequence variants had extremely high concordance with altered expression of hundreds of genes, including many with genotype-dependent treatment responses. We generated a deep mRNA-Seq dataset for the most divergent line and created a de novo transcriptome assembly. This led to the discovery of >2400 previously unannotated transcripts and hundreds of genes not present in the reference genome. We built a public database for visualization and investigation of sequence variants among these widely used inbred lines.
Plant genome size varies by four orders of magnitude, and most of this variation stems from dynamic changes in repetitive DNA content. Here we report the small 109 Mb genome of Selaginella lepidophylla, a clubmoss with extreme desiccation tolerance. Single-molecule sequencing enables accurate haplotype assembly of a single heterozygous S. lepidophylla plant, revealing extensive structural variation. We observe numerous haplotype-specific deletions consisting of largely repetitive and heavily methylated sequences, with enrichment in young Gypsy LTR retrotransposons. Such elements are active but rapidly deleted, suggesting “bloat and purge” to maintain a small genome size. Unlike all other land plant lineages, Selaginella has no evidence of a whole-genome duplication event in its evolutionary history, but instead shows unique tandem gene duplication patterns reflecting adaptation to extreme drying. Gene expression changes during desiccation in S. lepidophylla mirror patterns observed across angiosperm resurrection plants.
Resurrection plants desiccate during periods of prolonged drought stress, then resume normal cellular metabolism upon water availability. Desiccation tolerance has multiple origins in flowering plants, and it likely evolved through rewiring seed desiccation pathways. Oropetium thomaeum is an emerging model for extreme drought tolerance, and its genome, which is the smallest among surveyed grasses, was recently sequenced. Combining RNA-seq, targeted metabolite analysis and comparative genomics, we show evidence for co-option of seed-specific pathways during vegetative desiccation. Desiccation-related gene co-expression clusters are enriched in functions related to seed development including several seed-specific transcription factors. Across the metabolic network, pathways involved in programmed cell death inhibition, ABA signalling and others are activated during dehydration. Oleosins and oil bodies that typically function in seed storage are highly abundant in desiccated leaves and may function for membrane stability and storage. Orthologs to seed-specific LEA proteins from rice and maize have neofunctionalized in Oropetium with high expression during desiccation. Accumulation of sucrose, raffinose and stachyose in drying leaves mirrors sugar accumulation patterns in maturing seeds. Together, these results connect vegetative desiccation with existing seed desiccation and drought responsive pathways and provide some key candidate genes for engineering improved drought tolerance in crop plants.
European hazelnut (Corylus avellana L.) is of global agricultural and economic significance, with genetic diversity existing in hundreds of accessions. Breeding efforts have focused on maximizing nut yield and quality and reducing susceptibility to diseases such as Eastern filbert blight (EFB). Here we present the first sequenced genome among the order Fagales, the EFB-resistant diploid hazelnut accession 'Jefferson' (OSU 703.007). We assembled the highly heterozygous hazelnut genome using an Illumina only approach and the final assembly has a scaffold N50 of 21.5kb. We captured approximately 91 percent (345 Mb) of the flow-cytometry-determined genome size and identified 34,910 putative gene loci. In addition, we identified over 2 million polymorphisms across seven diverse hazelnut accessions and characterized t heir effect on coding sequences. We produced t wo high-density genetic maps with 3,209 markers from an F1 hazelnut population, representing a five-fold increase in marker density over previous maps. These genomic resources will aide in the discovery of molecular markers linked to genes of interest for hazelnut breeding efforts, and are available to the community at https://www.cavellanagenomeportal.com/.
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.