The parasitic protozoon Trichomonas vaginalis produces multiple forms of cysteine proteinase (CP). The molecular basis for this has now been examined by cloning DNA fragments encoding CPs. Using generic degenerate oligonucleotide primers based on two well-conserved regions within the central region of all eukaryotic CPs, several polymerase chain reaction fragments were isolated from T. vaginalis genomic DNA and shown to encode different CPs. One fragment with a well-represented sequence was used as a general probe to screen a T. vaginalis cDNA library at moderate stringency and five different cDNA clones were isolated. Preliminary sequencing showed that they encoded similar but distinct CPs. In the process of confirming the 5' end of one of these cDNA clones using RACE-PCR (rapid amplification of cDNA 5' endspolymerase chain reaction), an additional sequence encoding a different CP was identified. The corresponding clone (TvCP3) and the three longest clones from the library screen (TvCPI, TvCP2 and TvCP4) were characterized further. TvCPl and TvCP2 were full-length and TvCP3 and TvCP4 were apparently slightly less than full-length. Comparison of the predicted amino acid sequences of the four clones showed that TvCPl and TvCP4 are related (72 YO identity). TvCP2 is closer to TvCPl (60%) and TvCP4 (65%) than is TvCP3, which has 53%, 59% and 56% identity to TvCPI, TvCP2 and TvCP4, respectively. Comparison with the sequences of other known CPs indicated that the T. vaginalis gene products all belong to the cathepsin Ucathepsin H/papain branch of the papain superfamily. The TvCPI, TvCP2 and TvCP4 sequences are related (3845 O/ O identity) to those of CP2 of Dictyosfelium discoideum, human cathepsin L, three CPs from lobster and CPs from black gram, oilseed rape and rice (oryzains a and B). TvCP3 shows less identity to the other eukaryotic CPs but is most similar to D. discoideum CP2 (38%). The four predicted amino acid sequences share some features distinct from the majority of CPs, which suggests they might have had a common evolutionary origin. The most striking feature of sequences TvCPI, TvCP2 and TvCP3 is the apparent lack of a pre-sequence (signal sequence) for TvCPl and very short pre-sequences for TvCP2 and TvCP3. Southern analysis indicated that the organization of the genes corresponding to the TvCP cDNAs differs. The TvCPI, TvCP2 and TvCP3 genes are single-copy, whereas the TvCP4 gene appeared to be multiple-copy. Similarly sized, single abundant transcripts were present for all four sequences. Overall, the data show that we have identified a family of genes in T. vaginalis which encode a number of CPs. In total, seven distinct sequences have been recognized. This suggests that the multiplicity of CP activities seen