Diatoms are a ubiquitous class of microalgae of extreme importance for global primary productivity and for the biogeochemical cycling of minerals such as silica. However, very little is known about diatom cell biology or about their genome structure. For diatom researchers to take advantage of genomics and post-genomics technologies, it is necessary to establish a model diatom species. Phaeodactylum tricornutum is an obvious candidate because of its ease of culture and because it can be genetically transformed. Therefore, we have examined its genome composition by the generation of approximately 1,000 expressed sequence tags. Although more than 60% of the sequences could not be unequivocally identified by similarity to sequences in the databases, approximately 20% had high similarity with a range of genes defined functionally at the protein level. It is interesting that many of these sequences are more similar to animal rather than plant counterparts. Base composition at each codon position and GC content of the genome were compared with Arabidopsis, maize (Zea mays), and Chlamydomonas reinhardtii. It was found that distribution of GC within the coding sequences is as homogeneous in P. tricornutum as in Arabidopsis, but with a slightly higher GC content. Furthermore, we present evidence that the P. tricornutum genome is likely to be small (less than 20 Mb). Therefore, this combined information supports the development of this species as a model system for molecular-based studies of diatom biology. The nucleotide sequence data reported has been deposited in GenBank Nucleotide Sequence Database (dbEST section) under accession nos. BI306757 through BI307753.Diatoms are important components of marine phytoplankton, being particularly important for biogeochemical cycling of minerals such as silica, and for global carbon fixation (Werner, 1977;Tréguer et al., 1995). There are well over 250 genera of living diatoms, with perhaps as many as 100,000 species (Round et al., 1990). In toto, they may contribute as much as 25% of the total primary production on earth (Van Den Hoek et al., 1997). These figures illustrate the quantitative significance of diatoms for the functioning of "ecosystem Earth."The success of diatoms is not well understood, although it is known that they are remarkably flexible in adjusting their photosynthetic reactions to allow maximal growth rates over a wide range of light intensities (Falkowski and LaRoche, 1991), and that they may perform C4 photosynthesis (Reinfelder et al., 2000). In spite of their enormous ecological importance, only recently have they begun to attract the attention of molecular biologists (Scala and Bowler, 2001). As a consequence, knowledge of genome size and structure is extremely limited and only a few genes have been isolated. In November 2001, the sequences of less than 70 protein-encoding nuclear genes from diatoms had been deposited in GenBank (GenBank release 126, October 15, 2001).Diatoms are brown algae belonging to the division Heterokonta and are thought to have ar...