1In transcriptomics, the study of the total set of RNAs transcribed by the cell, RNA sequencing 2 (RNA-seq) has become the standard tool for analysing gene expression. The primary goal is 3 the detection of genes whose expression changes significantly between two or more conditions, 4 either for a single species or for two or more interacting species at the same time (dual RNA-seq, 5 triple RNA-seq and so forth). The analysis of RNA-seq can be simplified as many steps of the 6 data pre-processing can be standardised in a pipeline. 7 In this publication we present the "GEO2RNAseq" pipeline for complete, quick and concurrent 8 pre-processing of single, dual, and triple RNA-seq data. It covers all pre-processing steps 9 starting from raw sequencing data to the analysis of differentially expressed genes, including 10 various tables and figures to report intermediate and final results. Raw data may be provided 11 in FASTQ format or can be downloaded automatically from the Gene Expression Omnibus 12 repository. GEO2RNAseq strongly incorporates experimental as well as computational metadata. 13 GEO2RNAseq is implemented in R, lightweight, easy to install via Conda and easy to use, but still 14 very flexible through using modular programming and offering many extensions and alternative 15 workflows. 16 GEO2RNAseq is publicly available at https://anaconda.org/xentrics/r-geo2rnaseq 17 and https://bitbucket.org/thomas_wolf/geo2rnaseq/overview, including source 18 code, installation instruction, and comprehensive package documentation. 19 1 Seelbinder et al.
GEO2RNAseqthe detection of genes whose expression changes significantly between two or more conditions, and the 25 function relationship of these genes. RNA sequencing (RNA-seq) offers a complete, fast, and cheap way 26 to perform transcriptomics of single organisms using next-generation-sequencing technologies (Mardis, 27 2008). However, species do not exist in isolation. In fact, interspecies interactions are a major part of 28 environmental adaptation. The special variant dual RNA-seq can be applied to analyse the transcriptome 29 of two interacting species at the same time by separating their RNA in silico (Schulze et al., 2016; Wolf 30 et al., 2018). The concept of dual RNA-seq can be further extended: triple RNA-seq allows investigating 31 the interaction of three organisms, e. g. a host and two competing pathogens.
32Both, the number of scientists applying RNA-seq and the number of published datasets have been growing 33 exponentially 1 (Deelen et al., 2014). However, the bottlenecks in transcriptomics are the small number 34 of experts able to pre-process and analyse RNA-seq data, and the small number of easy-to-use tools. A 35 number of pipelines were published in R to handle this issue, but none of them includes the complete set of 36 pre-processing steps and none exploits the available metadata fully.
37Extensive utilisation of metadata is highly important. Wet-lab metadata, e. g. temperature or pH, and 38 dry-lab metadata, e. ...