Summary Next‐generation sequencing technologies allow an almost exhaustive survey of the transcriptome, even in species with no available genome sequence. To produce a Unigene set representing most of the expressed genes of pea, 20 cDNA libraries produced from various plant tissues harvested at various developmental stages from plants grown under contrasting nitrogen conditions were sequenced. Around one billion reads and 100 Gb of sequence were de novo assembled. Following several steps of redundancy reduction, 46 099 contigs with N50 length of 1667 nt were identified. These constitute the ‘Caméor’ Unigene set. The high depth of sequencing allowed identification of rare transcripts and detected expression for approximately 80% of contigs in each library. The Unigene set is now available online (http://bios.dijon.inra.fr/FATAL/cgi/pscam.cgi), allowing (i) searches for pea orthologs of candidate genes based on gene sequences from other species, or based on annotation, (ii) determination of transcript expression patterns using various metrics, (iii) identification of uncharacterized genes with interesting patterns of expression, and (iv) comparison of gene ontology pathways between tissues. This resource has allowed identification of the pea orthologs of major nodulation genes characterized in recent years in model species, as a major step towards deciphering unresolved pea nodulation phenotypes. In addition to a remarkable conservation of the early transcriptome nodulation apparatus between pea and Medicago truncatula, some specific features were highlighted. The resource provides a reference for the pea exome, and will facilitate transcriptome and proteome approaches as well as SNP discovery in pea.
Ecological speciation entails divergent selection on specific traits, and ultimately on the developmental pathways responsible for these traits. Selection can act on gene sequences, but also on regulatory regions responsible for gene expression. Mimetic butterflies are a relevant system for speciation studies because wing color pattern (WCP) often diverges between closely related taxa, and is thought to drive speciation through assortative mating and increased predation on hybrids. Here we generate the first transcriptomic resources for a mimetic butterfly of the tribe Ithomiini, Melinaea marsaeus, to examine patterns of differential expression between two subspecies and between tissues that express traits that likely drive reproductive isolation; WCP and chemosensory genes. We sequenced whole transcriptomes of three life stages to cover a large catalogue of transcripts and we investigated differential expression between subspecies in pupal wing discs and antennae. Eighteen known WCP genes were expressed in wing discs and 115 chemosensory genes were expressed in antennae, with a remarkable diversity of chemosensory protein genes. Many transcripts were differentially expressed between subspecies, including two WCP genes and one odorant receptor. Our results suggest that in M. marsaeus the same genes as in other mimetic butterflies are involved in traits causing reproductive isolation, and point at possible candidates for the differences in those traits between subspecies. Differential expression analyses of other developmental stages and body organs and functional studies are needed to confirm and expand these results. Our work provides key resources for comparative genomics in mimetic butterflies, and more generally in Lepidoptera.
BackgroundWith next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.FindingsDedicated to ‘whole-genome assembly-free’ treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories.ConclusionsWith the Colib’read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0105-2) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.