Background RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional RNA-seq data analysis workflow requires constructing a de-novo transcriptome assembly and annotating it against a high-confidence protein database. The assembly serves as a reference for read mapping, and the annotation is necessary for functional analysis of genes found to be differentially expressed. However, assembly is computationally expensive. It is also prone to errors that impact expression analysis, especially since sequencing depth is typically much lower for expression studies than for transcript discovery. Results We propose a shortcut, in which we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the high-confidence proteome that would have been otherwise used for annotation. By avoiding assembly, we drastically cut down computational costs – the running time on a typical dataset improves from the order of tens of hours to under half an hour, and the memory requirement is reduced from the order of tens of Gbytes to tens of Mbytes. We show through experiments on simulated and real data that our pipeline not only reduces computational costs, but has higher sensitivity and precision than a typical assembly-based pipeline. A Snakemake implementation of our workflow is available at: https://bitbucket.org/project_samar/samar. Conclusions The flip side of RNA-seq becoming accessible to even modestly resourced labs has been that the time, labor, and infrastructure cost of bioinformatics analysis has become a bottleneck. Assembly is one such resource-hungry process, and we show here that it can be avoided for quick and easy, yet more sensitive and precise, differential gene expression analysis in non-model organisms.
Background The fishery and aquaculture of the widely distributed mangrove crab Scylla serrata is a steadily growing, high-value, global industry. Climate change poses a risk to this industry as temperature elevations are expected to threaten the mangrove crab habitat and the supply of mangrove crab juveniles from the wild. It is therefore important to understand the genomic and molecular basis of how mangrove crab populations from sites with different climate profiles respond to heat stress. Towards this, we performed RNA-seq on the gill tissue of S. serrata individuals sampled from 3 sites (Cagayan, Bicol, and Bataan) in the Philippines, under normal and heat-stressed conditions. To compare the transcriptome expression profiles, we designed a 2-factor generalized linear model containing interaction terms, which allowed us to simultaneously analyze within-site response to heat-stress and across-site differences in the response. Results We present the first ever transcriptome assembly of S. serrata obtained from a data set containing 66 Gbases of cleaned RNA-seq reads. With lowly-expressed and short contigs excluded, the assembly contains roughly 17,000 genes with an N50 length of 2,366 bp. Our assembly contains many almost full-length transcripts – 5229 shrimp and 3049 fruit fly proteins have alignments that cover >80% of their sequence lengths to a contig. Differential expression analysis found population-specific differences in heat-stress response. Within-site analysis of heat-stress response showed 177, 755, and 221 differentially expressed (DE) genes in the Cagayan, Bataan, and Bicol group, respectively. Across-site analysis showed that between Cagayan and Bataan, there were 389 genes associated with 48 signaling and stress-response pathways, for which there was an effect of site in the response to heat; and between Cagayan and Bicol, there were 101 such genes affecting 8 pathways. Conclusion In light of previous work on climate profiling and on population genetics of marine species in the Philippines, our findings suggest that the variation in thermal response among populations might be derived from acclimatory plasticity due to pre-exposure to extreme temperature variations or from population structure shaped by connectivity which leads to adaptive genetic differences among populations.
Background: The fishery and aquaculture of the widely distributed mangrove crab Scylla serrata is a steadily growing, high-value, global industry. Climate change poses a risk to this industry as temperature elevations are expected to threaten the mangrove crab habitat and the supply of mangrove crab seeds from the wild. It is therefore important to understand the genomic and molecular basis of how mangrove crab populations from sites with different climate profiles respond to heat stress. Towards this, we performed RNA-seq on the gill tissue of S. serrata individuals sampled from 3 sites (Cagayan, Bicol, and Bataan) in the Philippines, under normal and heat-stressed conditions. To compare the transcriptome expression profiles, we designed a 2-factor generalized linear model containing interaction terms, which allowed us to simultaneously analyze within-site response to heat-stress and across-site differences in the response.Results: We present the first ever transcriptome assembly of S. serrata obtained from a massive data set containing ~66 Gbases of cleaned RNA-seq reads. With lowly-expressed and short contigs excluded, the assembly contains roughly 17,000 genes with an N50 length of 2,366 bp. Based on sequence comparison to the fruitfly and shrimp proteomes, our assembly contains several thousands of almost full-length transcripts. Differential expression analysis found population-specific differences in heat-stress response. Within-site analysis of heat response showed 177, 755, and 221 differentially expressed (DE) genes in the Cagayan, Bataan, and Bicol group, respectively. Across-site analysis of difference in heat response showed that between Cagayan and Bataan, there were 389 differently differentially expressed (DDE) genes associated with 48 signalling and stress-response pathways; and between Cagayan and Bicol, there were 101 DDE genes affecting 8 pathways.Conclusion: In light of previous work on climate profiling and on population genetics of marine species in the Philippines, our findings suggest that the variation in thermal response among populations might be derived from acclimatory plasticity due to pre-exposure to extreme temperature variations or from population structure shaped by connectivity which leads to adaptive genetic differences among populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.