High-throughput sequencing of cDNA prepared from RNA, an approach known as RNA-seq, is coming into increasing use as a method for transcriptome analysis. Despite its many advantages, widespread adoption of the technique has been hampered by a lack of easy-to-use, integrated, open-source tools for analyzing the nucleotide sequence data that are generated. Here we describe Xpression, an integrated tool for processing prokaryotic RNA-seq data. The tool is easy to use and is fully automated. It performs all essential processing tasks, including nucleotide sequence extraction, alignment, quantification, normalization, and visualization. Importantly, Xpression processes multiplexed and strand-specific nucleotide sequence data. It extracts and trims specific sequences from files and separately quantifies sense and antisense reads in the final results. Outputs from the tool can also be conveniently used in downstream analysis. In this paper, we show the utility of Xpression to process strand-specific RNA-seq data to identify genes regulated by CouR, a transcription factor that controls p-coumarate degradation by the bacterium Rhodopseudomonas palustris.
RNA-seq is a recently developed technique for global analysis of mRNA transcripts that involves the use of high-throughput sequencing technology (18). It has a number of advantages over traditional microarray-based technologies, including improved sensitivity, increased dynamic range, and lower cost. As a result, it is becoming the preferred tool for gene expression studies. Despite many advantages, widespread adoption of RNA-seq is impeded by a lack of easy-to-use, integrated, open-source tools for processing of the nucleotide sequence data that are generated as the output of the technique. Millions of raw sequence reads are generated for each RNA-seq experiment, making it impossible to process the sequencing data without bioinformatic tools.