An algorithm for protein engineering, termed recursive ensemble mutagenesis, has been developed to produce diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis. Starting from partially randomized "wild-type" DNA sequences, a highly parallel search of sequence space for peptides fitting an experimenter's criteria is performed. Each iteration uses information gained from the previous rounds to search the space more efficiently. Simulations of the technique indicate that, under a variety of conditions, the algorithm can rapidly produce a diverse population of proteins fitting specific criteria. In the experimental analog, genetic selection or screening applied during recursive ensemble mutagenesis should force the evolution of an ensemble of mutants to a targeted cluster of related phenotypes.One might envision the solution to the protein folding problem as a complex algorithm that accurately predicts the three-dimensional structure and function of proteins from primary amino acid sequence data. Although some progress has been made in predicting the secondary structure of proteins (1-9), no algorithm exists that can decode an amino acid sequence into a three-dimensional structure and predict the chemical properties of the resultant protein. If such an algorithm did exist, one could design and engineer proteins to solve problems in fields as diverse as industrial catalysis (10), bioremediation (11), and medicine (12). Although much progress has been made toward a semiempirical solution to the protein folding problem, a complete solution to this quandary seems to lie in the distant future (1).From the viewpoint of a molecular geneticist, and in the absence of a solution to the protein folding problem, how does one effectively search for specific proteins with desired structures and functions? In this communication, we present computer simulations of an algorithm that efficiently searches "sequence space" (13) for proteins with specified properties, while treating the protein folding problem as a black box. Each step in this simulated process is exactly analogous to standard laboratory processes: DNA synthesis, cloning, expression, screening, and sequencing. Both the simulated process and the putative experimental process are termed "recursive" because information gained in one round of mutagenesis is used to control the next. We have been successful in simulating recursive ensemble mutagenesis (REM) on as many as eight interactive amino acid sites and have embarked on analogous experiments with model proteins.Simultaneously randomizing eight amino acid positions in a protein leads to a sequence complexity of 208, or >25 billion different sequences. The principal advantage of REM over other mutagenesis methods is that one can assay a relatively small volume (e.g., 10,000 mutants) in this large sequence space, find a few "positives," and generate a much larger fr...