We report a random survey of 1 to 2% of the somatic genome of the free-living ciliate Paramecium tetraurelia by single-run sequencing of the ends of plasmid inserts. As in all ciliates, the germ line genome of Paramecium (100 to 200 Mb) is reproducibly rearranged at each sexual cycle to produce a somatic genome of expressed or potentially expressed genes, stripped of repeated sequences, transposons, and AT-rich unique sequence elements limited to the germ line. We found the somatic genome to be compact (>68% coding, estimated from the sequence of several complete library inserts) and to feature uniformly small introns (18 to 35 nucleotides). This facilitated gene discovery: 722 open reading frames (ORFs) were identified by similarity with known proteins, and 119 novel ORFs were tentatively identified by internal comparison of the data set. We determined the phylogenetic position of Paramecium with respect to eukaryotes whose genomes have been sequenced by the distance matrix neighbor-joining method by using random combined protein data from the project. The unrooted tree obtained is very robust and in excellent agreement with accepted topology, providing strong support for the quality and consistency of the data set. Our study demonstrates that a random survey of the somatic genome of Paramecium is a good strategy for gene discovery in this organism.Alongside an ever-growing number of prokaryotic genomes, several fungal, invertebrate, plant, and vertebrate genomes have now been largely or completely sequenced and released to the public, providing a wealth of information for functional and comparative studies. Given the variety of unicellular eukaryotes and the great evolutionary distances even within protist phyla, it is striking that few protists have been subjected to systematic genomic investigation. Notable exceptions are the cellular slime mold Dictyostelium discoidium and a few parasites of great medical importance such as Plasmodium spp. Ciliates are one of the major eukaryotic groups for which no large-scale genome project has been undertaken.Ciliate models (Paramecium, Tetrahymena) have allowed major discoveries in biology such as variant nuclear genetic codes (13, 49), ribozymes (38), telomerase (33), and histone acetyltransferase as a transcription factor (10) and present fascinating epigenetic phenomena acting at DNA (14, 21, 42), RNA (51), and protein (6) levels. Unique among unicellular eukaryotes, ciliates separate germinal and somatic lines, in the form of nuclei (50). Somatic development involves programmed rearrangements of the entire germ line genome at each sexual generation, so that ciliates provide excellent experimental models for studying somatic DNA rearrangements similar to those that generate antibody diversity and malignant states in vertebrates.The germ line micronucleus is diploid, is transcriptionally silent during vegetative growth, and intervenes during sexual processes. The somatic macronucleus is highly polyploid and responsible for transcriptional activity but is not transmitt...