Recognition of DNA by proteins relies on direct interactions with specific DNA-functional groups, along with indirect effects that reflect variable energetics in the response of DNA sequences to twisting and bending distortions induced by proteins. Predicting indirect readout requires knowledge of the variations in DNA curvature and flexibility in the affected region, which we have determined for a series of DNA-binding sites for the E2 regulatory protein by using the cyclization kinetics method. We examined 16 sites containing different noncontacted spacer sequences, which vary by more than three orders of magnitude in binding affinity. For 15 of these sites, the variation in affinity was predicted within a factor of 3, by using experimental curvature and flexibility values and a statistical mechanical theory. The sole exception was traced to differential magnesium ion binding.
Because many proteins deform DNA upon binding (1, 2), it is reasonable to expect that protein-DNA association might be facilitated by enhanced ease of DNA deformation and a match between intrinsic DNA shape in solution and the strained DNA conformation in the complex. The lack of a simple correspondence code between amino acids and DNA bases in hydrogen bonding, or direct readout, makes a general prediction of protein-DNA affinity impossible at this time. However, variations in contributions from indirect readout due to sequencedependent shape and mechanical properties may be predictable, yielding a partial structural code for protein-DNA interaction (3). Many attempts to test this idea have been performed during the last two decades with different proteins, such as nucleosome (4-6), cAMP-binding protein (7), 434 repressor (8, 9), TATA box-binding protein (10), and E2 protein (11).Among the impediments to realization of this objective is the difficulty of accurately determining the multiple DNA parameters involved, including magnitude and direction of curvature, helical twist, and bending and torsional flexibilities, for a variety of sequences corresponding to the region of indirect readout. The DNA cyclization method (1, 12, 13), in the high-throughput format we recently described (14), coupled with a statistical mechanical theory (15) for extracting the curvature and flexibility parameters from the data, provides a solution to this problem. Although the extent of variation of DNA flexibility with sequence remains controversial, recent results from the cyclization kinetics method show that a DNA sequence with high histone affinity (TATAAACGCC) has a nearly 2-fold smaller bending force constant and 35% less torsional rigidity than generic DNA (6). Also, an AT repeating sequence has 28% lower bending rigidity (14).Experimental testing of the accuracy of prediction of indirect readout requires that the nucleotides involved not be also engaged in direct interactions. A system that meets this requirement is the noncontacted spacer region in the DNA-binding site for the E2 protein encoded by the human papillomavirus (HPV) type 16 genome (11)....