The accurate modeling of energetic contributions to protein
structure
is a fundamental challenge in computational approaches to protein
analysis and design. We describe a general computational method, EmCAST
(empirical Cα stabilization), to score and optimize the sequence
to the structure in proteins. The method relies on an empirical potential
derived from the database of the Cα dihedral angle preferences
for all possible four-residue sequences, using the data available
in the Protein Data Bank. Our method produces stability predictions
that naturally correlate one-to-one with the experimental results
for solvent-exposed mutation sites. EmCAST predicted four mutations
that increased the stability of a three-helix bundle, UBA(1), from
2.4 to 4.8 kcal/mol by optimizing residues in both helices and turns.
For a set of eight variants, the predicted and experimental stabilizations
correlate very well (R
2 = 0.97) with a
slope near 1 and with a 0.16 kcal/mol standard error for EmCAST predictions.
Tests against literature data for the stability effects of surface-exposed
mutations show that EmCAST outperforms the existing stability prediction
methods. UBA(1) variants were crystallized to verify and analyze their
structures at an atomic resolution. Thermodynamic and kinetic folding
experiments were performed to determine the magnitude and mechanism
of stabilization. Our method has the potential to enable the rapid,
rational optimization of natural proteins, expand the analysis of
the sequence/structure relationship, and supplement the existing protein
design strategies.