A simple classification scheme that uses only the presence or absence of a protein domain architecture has been used to determine the phylogeny of 174 complete genomes. The method correctly divides the 174 taxa into Archaea, Bacteria, and Eukarya and satisfactorily sorts most of the major groups within these superkingdoms. The most challenging problem involved 119 Bacteria, many of which have reduced genomes. When a weighting factor was used that takes account of difference in genome size (number of considered folds), small-genome taxa were mostly grouped with their full-sized counterparts. Although not every organism appears exactly at its classical phylogenetic position in these trees, the agreement appears comparable with the efforts of others by using sophisticated sequence analysis and͞or combinations of gene content and gene order. During the course of the study, it emerged that there is a core set of Ϸ50 folds that is found in all 174 genomes and a single fold diagnostic of all Archaea.fold superfamily T he advent of the era of complete genome sequences has led to a variety of approaches for determining the evolutionary history of organisms over and beyond the comparison of the sequences themselves (1-4), including the use of such features as concatenated protein sequences (5, 6), gene content (1-3, 7), gene order (8-10), and the distribution of structural folds (11-15). Such efforts have continued even though there are those who feel the construction of a unified phylogeny is a hopeless task, horizontal gene transfers having been too pervasive to allow a singular depiction (16). In this vein, it is fair to say that the resulting phylogenies have not been entirely consistent between one method and another, and certainly none on its own has resulted in a wholly satisfactory classification. Attempts to filter out anomalies (17) or the use of combinations of various approaches (9, 10) have been more satisfactory, but incongruities remain.The principal goal of these endeavors is to generate a phylogeny that best represents the evolutionary histories of the taxa represented, and that resolves previous incongruities. It is generally agreed that three major forces are at work in modifying the genetic information in any genome: (i) expansion (gene duplication), (ii) deletion (gene loss), and (iii) exchange (horizontal transfer) (18)(19)(20)(21)(22). Additionally, there must be some degree of de novo ''gene genesis,'' the concoction of new genes by various means (23). The challenge is to find the level of informational bundling that best accounts for this combination of events.Here we report a simple scheme that uses a structural attribute, the protein domain content, as the principal determinant of relatedness. In particular, we have focused on the fold superfamily level (FSF) as opposed to the fold grouping itself that has been used by many other workers in the past (11-15). It is a subtle but critical distinction (14). The mere presence or absence of an FSF in a genome, as opposed to its overall abundance, was ...