To discover all distinct human genes and to determine their patterns of expression across different cell types, developmental stages, and physiological conditions, a procedure is needed for fast, mutual comparison of hundreds of thousands (and perhaps millions) of clones from cDNA libraries, as well as their comparison against data bases of sequenced DNA. In a pilot study, 29,5"70 clones in duplicate from both original and normalized, directional, infant brain cDNA libraries were hybridized with 107-215 heptamer oligonucleotide probes to obtain oligonucleotide sequence signatures (OSSs). The OSSs were compared and clustered based on mutual similarity into 16,741 clusters, each corresponding to a distinct cDNA. A number of distinct cDNAs were successfully recognized by matching their 107-probe OSSs against GenBank entries, indicating the possibility of sequence recognition with only a few hundred randomly chosen oligomers.An intermediate and currently feasible step in the Human Genome Project is the sequencing of cDNA fragments, which are referred to as expressed sequence tags, (ESTs) (Adams et al. 1991).EST strategy relies on a one-at-a-time random, direct sampling of cDNA libraries, with the result that every second to third sequence is needlessly resequenced if directly prepared libraries are used. As the number of sequenced cDNAs grows, the resequencing problem will inevitably worsen. To reduce resequencing, libraries are typically normalized by biochemical procedures (Soares et al. 1994). In the normalized libraries, the relative abundances of the most frequent and of the rarest cDNAs are equalized to a large de-