Background
The club-legged grasshopper
Gomphocerus sibiricus
is a Gomphocerinae grasshopper with a promising future as model species for studying the maintenance of colour-polymorphism, the genetics of sexual ornamentation and genome size evolution. However, limited molecular resources are available for this species. Here, we present a de novo transcriptome assembly as reference resource for gene expression studies. We used high-throughput Illumina sequencing to generate 5,070,036 paired-end reads after quality filtering. We then combined the best-assembled contigs from three different de novo transcriptome assemblers (Trinity, SOAPdenovo-trans and Oases/Velvet) into a single assembly.
Results
This resulted in 82,251 contigs with a N50 of 1357 and a TransRate assembly score of 0.325, which compares favourably with other orthopteran transcriptome assemblies. Around 87% of the transcripts could be annotated using InterProScan 5, BLASTx and the
dammit!
annotation pipeline. We identified a number of genes involved in pigmentation and green pigment metabolism pathways. Furthermore, we identified 76,221 putative single nucleotide polymorphisms residing in 8400 contigs. We also assembled the mitochondrial genome and investigated levels of sequence divergence with other species from the genus
Gomphocerus
. Finally, we detected and assembled
Wolbachia
sequences, which revealed close sequence similarity to the strain pel wPip.
Conclusions
Our study has generated a significant resource for uncovering genotype-phenotype associations in a species with an extraordinarily large genome, while also providing mitochondrial and
Wolbachia
sequences that will be useful for comparative studies.
Electronic supplementary material
The online version of this article (10.1186/s12864-019-5756-4) contains supplementary material, which is available to authorized users.