The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.
Animal transcriptomes are dynamic, each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. We identified new genes, transcripts, and proteins using poly(A)+ RNA sequence from Drosophila melanogaster cultured cell lines, dissected organ systems, and environmental perturbations. We found a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long noncoding RNAs (lncRNAs) some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized arising from combinatorial usage of promoters, splice sites, and polyadenylation sites.
BackgroundThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.ResultsWe develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.ConclusionsIn addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.
The plant hormone auxin, in particular indole-3-acetic acid (IAA), is a key regulator of virtually every aspect of plant growth and development. Auxin regulates transcription by rapidly modulating levels of Aux/IAA proteins throughout development. Recent studies demonstrate that auxin perception occurs through a novel mechanism. Auxin binds to TIR1, the F-box subunit of the ubiquitin ligase complex SCF(TIR1), and stabilizes the interaction between TIR1 and Aux/IAA substrates. This interaction results in Aux/IAA ubiquitination and subsequent degradation. Regulation of the Aux/IAA protein family by TIR1 and TIR1-like auxin receptors (AFBs) links auxin action to transcriptional regulation and provides a model by which the vast array of auxin influences on development may be understood. Moreover, auxin receptor function is the first example of small-molecule regulation of an SCF ubiquitin ligase and may have important implications for studies of regulated protein degradation in other species, including animals.
The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.
BackgroundTheobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.ResultsWe describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.ConclusionsWe report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
SummaryCamelina (Camelina sativa), a Brassicaceae oilseed, has received recent interest as a biofuel crop and production platform for industrial oils. Limiting wider production of camelina for these uses is the need to improve the quality and content of the seed protein-rich meal and oil, which is enriched in oxidatively unstable polyunsaturated fatty acids that are deleterious for biodiesel. To identify candidate genes for meal and oil quality improvement, a transcriptome reference was built from 2047 Sanger ESTs and more than 2 million 454-derived sequence reads, representing genes expressed in developing camelina seeds. The transcriptome of approximately 60K transcripts from 22 597 putative genes includes camelina homologues of nearly all known seedexpressed genes, suggesting a high level of completeness and usefulness of the reference. These sequences included candidates for 12S (cruciferins) and 2S (napins) seed storage proteins (SSPs) and nearly all known lipid genes, which have been compiled into an accessible database. To demonstrate the utility of the transcriptome for seed quality modification, seed-specific RNAi lines deficient in napins were generated by targeting 2S SSP genes, and high oleic acid oil lines were obtained by targeting FATTY ACID DESATURASE 2 (FAD2) and FATTY ACID ELONGASE 1 (FAE1). The high sequence identity between Arabidopsis thaliana and camelina genes was also exploited to engineer high oleic lines by RNAi with Arabidopsis FAD2 and FAE1 sequences. It is expected that these transcriptomic data will be useful for breeding and engineering of additional camelina seed traits and for translating findings from the model Arabidopsis to an oilseed crop.
The association between fitness-related phenotypic traits and an environmental gradient offers one of the best opportunities to study the interplay between natural selection and migration. In cases in which specific genetic variants also show such clinal patterns, it may be possible to uncover the mutations responsible for local adaptation. The malaria vector, Anopheles gambiae, is associated with a latitudinal cline in aridity in Cameroon; a large inversion on chromosome 2L of this mosquito shows large differences in frequency along this cline, with high frequencies of the inverted karyotype present in northern, more arid populations and an almost complete absence of the inverted arrangement in southern populations. Here we use a genome resequencing approach to investigate patterns of population divergence along the cline. By sequencing pools of individuals from both ends of the cline as well as in the center of the cline-where the inversion is present in intermediate frequency-we demonstrate almost complete panmixia across collinear parts of the genome and high levels of differentiation in inverted parts of the genome. Sequencing of separate pools of each inversion arrangement in the center of the cline reveals large amounts of gene flux (i.e., gene conversion and double crossovers) even within inverted regions, especially away from the inversion breakpoints. The interplay between natural selection, migration, and gene flux allows us to identify several candidate genes responsible for the match between inversion frequency and environmental variables. These results, coupled with similar conclusions from studies of clinal variation in Drosophila, point to a number of important biological functions associated with local environmental adaptation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.