Cannabis is a diverse and polymorphic species. To better understand cannabinoid synthesis inheritance and its impact on pathogen resistance, we shotgun sequenced and assembled a Cannabis trio (sibling pair and their offspring) utilizing long read single molecule sequencing. This resulted in the most contiguous Cannabis sativa assemblies to date. These reference assemblies were further annotated with full-length male and female mRNA sequencing (Iso-Seq) to help inform isoform complexity, gene model predictions and identification of the Y chromosome. To further annotate the genetic diversity in the species, 40 male, female, and monoecious cannabis and hemp varietals were evaluated for copy number variation (CNV) and RNA expression. This identified multiple CNVs governing cannabinoid expression and 82 genes associated with resistance to Golovinomyces chicoracearum, the causal agent of powdery mildew in cannabis. Results indicated that breeding for plants with low tetrahydrocannabinolic acid (THCA) concentrations may result in deletion of pathogen resistance genes. Low THCA cultivars also have a polymorphism every 51 bases while dispensary grade high THCA cannabis exhibited a variant every 73 bases. A refined genetic map of the variation in cannabis can guide more stable and directed breeding efforts for desired chemotypes and pathogen-resistant cultivars. Sequence and annotation of 42 cannabis genomes reveals extensive copy number variation in cannabinoid synthesis and pathogen resistance genes
Maize (Zea mays) possesses a large, highly repetitive genome, and subsequently a number of reduced-representation sequencing approaches have been used to try and enrich for gene space while eluding difficulties associated with repetitive DNA. This article documents the ability of publicly available maize expressed sequence tag and Genome Survey Sequences (GSSs; many of which were isolated through the use of reduced representation techniques) to recognize and provide coverage of 78 maize full-length cDNAs (FLCs). All 78 FLCs in the dataset were identified by at least three GSSs, indicating that the majority of maize genes have been identified by at least one currently available GSS. Both methyl-filtration and high-Cot enrichment methods provided a 7-to 8-fold increase in gene discovery rates as compared to random sequencing. The available maize GSSs aligned to 75% of the FLC nucleotides used to perform searches, while the expressed sequence tag sequences aligned to 73% of the nucleotides. Our data suggest that at least approximately 95% of maize genes have been tagged by at least one GSS. While the GSSs are very effective for gene identification, relatively few (18%) of the FLCs are completely represented by GSSs. Analysis of the overlap of coverage and bias due to position within a gene suggest that RescueMu, methyl-filtration, and high-Cot methods are at least partially nonredundant.
The last eukaryotic common ancestor had two classes of introns that are still found in most eukaryotic lineages. Common U2-type and rare U12-type introns are spliced by the major and minor spliceosomes, respectively. Relatively few splicing factors have been shown to be specific to the minor spliceosome. We found that the maize RNA Binding Motif Protein48 (RBM48) is a U12 splicing factor that functions to promote cell differentiation and repress cell proliferation. RBM48 is coselected with the U12 splicing factor, ZRSR2/RGH3. Protein-protein interactions between RBM48, RGH3, and U2 Auxiliary Factor (U2AF) subunits suggest major and minor spliceosome factors may form complexes during intron recognition. Human RBM48 interacts with ARMC7.Maize RBM48 and ARMC7 have a conserved protein-protein interaction. These data predict that RBM48 is likely to function in U12 splicing throughout eukaryotes and that U12 splicing promotes endosperm cell differentiation in maize.
We describe the use of a Decentralized Autonomous Organization (DAO) to crypto-fund the single molecule sequencing and publication of a Type II Cannabis plant. This resulted in the construction of the most contiguous Cannabis genome assembly to date. The combined use of the Dash cryptocurrency, DAOs, and Pacific Biosciences sequencing delivered a 1.03 Gb genome with a N50 of 665Kb in 77 days from funding to public upload. This represents a 230 fold improvement in the contiguity of the first cannabis assemblies in 2011 and a 4 fold improvement over all cannabis assemblies to date. 34Gb of additional sequencing pushed the assembly to a N50 of 3.8Mb. Hi-C data from Phase Genomics further scaffolded the assembly to 35 contigs at an N50 of 74Mb but requires additional curation. The genome is partially phased and larger than previously reported (2N = 1.33Gb). The CBCA, THCA and CBDA synthase gene clusters have been phased onto respective contigs demonstrating tandem repeat expansions.
Strawberry (Fragaria spp.) is a valuable fruit crop as well as an outstanding system for studying functional genomics in plants. The goal of this study was to substantially increase and analyze the available expressed sequence information in the genus by examining the transcriptome of the cultivated strawberry (Fragaria × ananassa Duchesne). To maximize transcript diversity and discovery, plants representing an octoploid strawberry cultivar were subjected to a broad range of treatments. Plant materials were pooled by tissue type. cDNA pools were sequenced by the Roche‐454 GS‐FLX system and assembled into over 32,000 contigs. Predictions of cellular localization and function were made by associating assembled contigs to annotated homologs, and the tissue pool tags provided a means to assess the overall expression pattern for any given transcript. Contigs comprised of reads originating from only one organ type and those present equally in all plant organs were both identified. Bacterial and fungal sequences found in the strawberry samples provide a metagenomic survey of the microbial community of a greenhouse strawberry plant. This study utilized an innovative assembly strategy on pooled tissues, thus providing a foundation for developing tissue‐specific tools, an opportunity to identify alleles for marker‐assisted selection, a reference of strawberry gene annotations, and a basis for comparative transcriptomics between cultivated strawberry, its diploid ancestors, and the wider Rosaceae family.
Ferns are the second largest clade of vascular plants with over 10,000 species, yet the generation of genomic resources for the group has lagged behind other major clades of plants. Transcriptomic data have proven to be a powerful tool to assess phylogenetic relationships, using thousands of markers that are largely conserved across the genome, and without the need to sequence entire genomes. We assembled the largest nuclear phylogenetic dataset for ferns to date, including 2884 single-copy nuclear loci from 247 transcriptomes (242 ferns, five outgroups), and investigated phylogenetic relationships across the fern tree, the placement of whole genome duplications (WGDs), and gene retention patterns following WGDs. We generated a well-supported phylogeny of ferns and identified several regions of the fern phylogeny that demonstrate high levels of gene tree–species tree conflict, which largely correspond to areas of the phylogeny that have been difficult to resolve. Using a combination of approaches, we identified 27 WGDs across the phylogeny, including 18 large-scale events (involving more than one sampled taxon) and nine small-scale events (involving only one sampled taxon). Most inferred WGDs occur within single lineages (e.g., orders, families) rather than on the backbone of the phylogeny, although two inferred events are shared by leptosporangiate ferns (excluding Osmundales) and Polypodiales (excluding Lindsaeineae and Saccolomatineae), clades which correspond to the majority of fern diversity. We further examined how retained duplicates following WGDs compared across independent events and found that functions of retained genes were largely convergent, with processes involved in binding, responses to stimuli, and certain organelles over-represented in paralogs while processes involved in transport, organelles derived from endosymbiotic events, and signaling were under-represented. To date, our study is the most comprehensive investigation of the nuclear fern phylogeny, though several avenues for future research remain unexplored.
One difficulty when identifying and analyzing alternative splicing (AS) events in plants is distinguishing functional AS from splicing noise. One way to add confidence to the validity of a splice isoform is to observe that it is conserved across evolutionarily related species. We use a high throughput method to identify junction based conserved AS events from RNA-Seq data across nine plant species including: five grass monocots (maize, sorghum, rice, Brachpodium and foxtail millet), plus two non-grass monocots (bananan and African oil palm), the eudicot Arabidopsis and the basal angiosperm Amborella. In total, 9,804 conserved AS events within 19,235 genes were identified conserved between 2 or more species studied. In grasses containing large regions of conserved synteny, the frequency of conserved AS events is twice that observed for genes outside of conserved synteny blocks. In plant-specific RS and RS2Z subfamilies, we observe both conservation and divergence of AS events after the whole genome duplication in maize. In addition, plant-specific RS and RS2Z subfamilies are highly connected with R2R3-MYB in splicing networks. Furthermore, we discovered that the network based on genes harboring conserved AS events is enriched for phosphatases, kinases and ubiquitylation genes, which suggests that AS may participate in regulating signaling pathways. These data lay the foundation for identifying and studying conserved AS events in the monocots, particularly across grass species, and this conserved AS resource identifies an additional layer between genotype to phenotype that may impact future crop improvement efforts.
INTRODUCTIONIn this protocol, 454 expressed sequence tags (ESTs) are generated by sequencing shoot apical meristem (SAM) cDNA from maize inbred lines on the 454 Life Sciences GS-20 sequencing system. The computational tool PolyBayes (Marth et al. 1999) is then used to identify single-nucleotide polymorphisms (SNPs). PolyBayes has been used successfully to identify SNPs in many different systems, including maize, and is particularly recommended for identifying SNPs in 454 sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.