Massively parallel sequencing holds great promise for expression profiling, as it combines the high throughput of SAGE with the accuracy of EST sequencing. Nevertheless, until now only very limited information had been available on the suitability of the current technology to meet the requirements. Here, we evaluate the potential of 454 sequencing technology for expression profiling using Drosophila melanogaster. We show that short (< ∼80 bp) and long (> ∼300-400 bp) cDNA fragments are under-represented in 454 sequence reads. Nevertheless, sequencing of 3Ј cDNA fragments generated by nebulization could be used to overcome the length bias of the 454 sequencing technology. Gene expression measurements generated by restriction analysis and nebulization for fragments within the 80-to 300-bp range showed correlations similar to those reported for replicated microarray experiments (0.83-0.91); 97% of the cDNA fragments could be unambiguously mapped to the genomic DNA, demonstrating the advantage of longer sequence reads. Our analyses suggest that the 454 technology has a large potential for expression profiling, and the high mapping accuracy indicates that it should be possible to compare expression profiles across species.[Supplemental material is available online at www.genome.org. The EST sequences have been deposited in GenBank under accession nos. EV574767-EV600806.]Gene expression technologies have greatly matured over the past years, but it has become clear that hybridization-based approaches have obvious limitations in cross-species comparisons (Gilad et al. 2005(Gilad et al. , 2006. Probably the most eminent problems are mismatches in heterologous probes and probe-specific hybridization kinetics, which complicate the design of speciesspecific oligonucleotide arrays. Alternatively, sequencing-based approaches could be used to measure gene expression if the sequence reads could be unambiguously mapped to the corresponding transcripts. While the short sequence reads of serial analysis of gene expression (SAGE) (Velculescu et al. 1995) and related techniques are severely limited by the requirement of a reliable genome annotation, the recently developed 454 sequencing technology (Margulies et al. 2005) may provide sufficient sequence information to overcome this limitation at moderate costs.In this study, we evaluate the potential of 454 sequencing technology to serve as a reliable tool for expression profiling. We show that 454 sequencing technology has a biased representation of cDNA fragments with different length. However, in combination with random breakage of the cDNAs by nebulization, 454 sequencing provides an excellent tool for expression profiling. The high accuracy with which we could map the sequenced fragments onto the Drosophila melanogaster genome suggests that 454 sequencing has great potential for interspecific expression profiling.
Results
Conceptual designMeasuring gene expression by sequencing requires only that a proportion of the transcript be analyzed. We sequenced a 3Ј region of the cDNA to...