Extensive efforts have been devoted to determining the binding specificity of Src homology 3 (SH3) domains usually in a case-by-case manner. A generic structure-based model is necessary to decipher the protein recognition code of the entire domain family. In this study, we have developed a general framework that combines molecular modeling and a machine learning algorithm to capture the energetic characteristics of the domain-peptide interactions and predict the binding specificity of the SH3 domain family. Our model is not trained for individual SH3 domains; rather it is a generic model for the entire domain family. Our model not only achieved satisfactory prediction accuracy but also provided structural insights into which residues are important for the binding specificity 1 domain (4) that recognizes proline-rich peptides with a core motif of PXXP (P is a proline and X is any amino acid) (5, 6). Peptides can bind to SH3 domains in two opposite orientations and are referred as class I and II peptides, which often contain ϩXXPXXP and PXXPXϩ (where X refers to any residue and ϩ refers to a positively charged residue) motifs, respectively. The binding specificity of an SH3 domain is determined by the amino acids in the flanking regions of the core motif, which has been investigated extensively for individual domains. However, a universal model was lacking to decipher the protein recognition code of the SH3 domain family.A generic model for the entire domain family needs to 1) provide a general framework to characterize the domainpeptide interaction and 2) reliably predict the binding specificity of each member in the domain family. Previous experimental and computational studies can only satisfy one of these requirements. For example, peptide library and peptide or protein array technologies are commonly used to determine the peptide motifs recognized by a domain, often represented as a position-specific scoring matrix (7-13). These approaches have limited coverage of the peptide space because the peptides tested in the experiments usually only represent a small portion of all the possible peptides of a given length. In addition, the prediction power of a sequence motif on interacting partners of a domain is often unsatisfactory. Along that line, a survey of protein-protein interaction interfaces (14) also suggested that a sophisticated model, rather than a set of well defined rules, is needed to decipher the specificity of protein recognition.On the other hand, high throughput technologies, such as yeast two-hybrid assay and complex purification followed by mass spectrometry, have been used to identify protein-protein interactions. However, these methods often miss the weak and transient domain-peptide interactions (15). Various computational methods have also been developed to predict the interacting partners of modular domains (16 -20). For example, the SH3-SPOT method builds a position-specific contact frequency matrix based on the protein-peptide contacts in a number of crystal structures of SH3-peptide and ...