We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.Keywords: consensus sequence; homology searching; multiple sequence alignment; protein blocks; sequence databanks Improvements in the efficiency of large-scale DNA sequencing are leading to rapid increases in the number of databank sequences that lack genetic or biochemical documentation. This is clearly the case for databases of cDNA sequence fragments (Boguski et al., 1993), which are thought to represent the majority of all human protein sequences, and for databases from large genome sequencing projects, such as the sequencing of uncharacterized bacterial genomes (Nowak, 1995). Matching these unknown sequences with sequences of known function is a major goal of genome research. Meanwhile, there remains the traditional goal of detecting homologues to help understand the function of a protein of interest to a biologist. Improved methods for detecting homology in database searches aid in achieving both goals.It is widely assumed that homology detection can be improved by utilizing multiple alignment information. Either a single sequence query is used to search for homologues in a database of multiple sequence alignments (Henikoff & Henikoff, 1991;Attwood & Beck, 1994;Sonnhammer & Kahn, 1994) or patterns (Smith & Smith, 1990;Bairoch, 1992), or an alignment or pattern query is used to search a sequence database (Gribskov et al., 1987;Henikoff et al., 1990;Krogh et al., 1994;Neuwald & Green, 1994;Tatusov et al., 1994;Thompson et al., 1994b (Gribskov et al., 1990; Krogh et ah., 1994;Eddy, 1996). In either case, position-specific scoring matrices (PSSMs) can represent all available information in a multiple sequence alignment, and several improvements in constructing PSSMs have been introduced recently (Brown et al., 1993;Tatusov et al., 1994;Bailey & Gribskov, 1996;Henikoff & Henikoff, 1996;Sjolander et al., 1996). However, there are no comprehensive evaluation studies that demonstrate the superiority of any multiple alignment-based querying method over single sequence querying methods such as BLAST ), FASTA (Pearson, 1990), and Smith-Waterman (Smith & Waterman...