One hundred-forty-five full-length aldehyde dehydrogenase-related sequences were aligned to determine relationships within the aldehyde dehydrogenase~ALDH! extended family. The alignment reveals only four invariant residues: two glycines, a phenylalanine involved in NAD binding, and a glutamic acid that coordinates the nicotinamide ribose in certain E-NAD binary complex crystal structures, but which may also serve as a general base for the catalytic reaction. The cysteine that provides the catalytic thiol and its closest neighbor in space, an asparagine residue, are conserved in all ALDHs with demonstrated dehydrogenase activity. Sixteen residues are conserved in at least 95% of the sequences; 12 of these cluster into seven sequence motifs conserved in almost all ALDHs. These motifs cluster around the active site of the enzyme. Phylogenetic analysis of these ALDHs indicates at least 13 ALDH families, most of which have previously been identified but not grouped separately by alignment. ALDHs cluster into two main trunks of the phylogenetic tree. The largest, the "Class 3" trunk, contains mostly substrate-specific ALDH families, as well as the class 3 ALDH family itself. The other trunk, the "Class 102" trunk, contains mostly variable substrate ALDH families, including the class 1 and 2 ALDH families. Divergence of the substrate-specific ALDHs occurred earlier than the division between ALDHs with broad substrate specificities. A site on the World Wide Web has also been devoted to this alignment project.Keywords: aldehyde dehydrogenase~ec 1.2.1.3!; multiple sequence alignment; protein family Aldehyde dehydrogenases~ALDHs! catalyze the oxidation of aldehydes to their corresponding carboxylic acids and occur throughout all phyla. Many disparate aldehydes are ubiquitous in nature and most are toxic at low levels because of their chemical reactivity. Thus, levels of metabolic-intermediate aldehydes must be carefully regulated. For this, most well-studied organisms are known to have several distinct ALDHs, which take part in a variety of physiological roles. Some ALDHs are highly specific for a very limited range of substrates while others show a broad substrate specificity. All ALDHs require either NAD or NADP as a cofactor reviewed, Lindahl, 1992; Yoshida et al., 1998!.Within a decade of the first ALDH sequence, alignment of 16 of the then most divergent ALDH sequences~Hempel et al., 1993! supported a common, conserved ALDH structure and suggested residues with important structural and functional roles, similar to findings in other enzyme families~Jörnvall, 1977;Brändén & Tooze, 1991;Creighton, 1993 Just recently the first two ALDH tertiary structures have been reported~Liu et al., 1997; Steinmetz et al., 1997!. Since many forthcoming studies on ALDHs will depend on dissection of these molecular structures, it is useful to "take a step back" and examine the ALDH extended family as a whole, allowing information based on the known tertiary structures to be more readily be applied to other more diverse ALDHs. In...