The question of whether proteins originate from random sequences of amino acids is addressed. A statistical analysis is performed in terms of blocked and random walk values formed by binary hydrophobic assignments of the amino acids along the protein chains. Also, recent work on simplified models suggest nonrandomness (4,5). In these studies a large number of randomly selected sequences were investigated, and it was found that only a small fraction of them folded easily into a thermodynamically stable state.In this work we study the statistical distribution of hydrophobicity by using methods different from the run test in ref. 1. Along the same lines as in ret 3, rather than analyzing raw sequences of hydrophobicity, we focus on the corresponding random walk representation. In this way, the analysis is more sensitive to long-range correlations along the sequence. Our analysis has been carried out using two different methods, which differ substantially from what is used in ref. 3, although the starting point is similar. First, we form block variables, and study how the behavior of these depends on the block size. When applied to the SWISS-PROT data base (6) of functional proteins, this method yields clear evidence for nonrandomness. In addition, we have performed a Fourier analysis based on the random walk representation. In this analysis we find nonrandom behavior at the wavelength corresponding to a-helix structure, as one might have expected, but also at large wavelengths.In our analysis, we have divided the sequences into groups corresponding to different fractions of hydrophobic residues. This division is important, because the results for different groups deviate in different directions from those for random sequences. For sequences with a typical fraction of hydrophobic residues, we find that the nonrandomness can be interpreted as anticorrelations. This interpretation emerges from a simple Ising model of antiferromagnetic interactions among the residues.Given the impact our results might have on the issue of how permissive with respect to sequence specificity the protein folding process is, we have carried out the same analysis for a toy model (7,8), for which unbiased samples of folding and nonfolding sequences can be obtained. This model, hereafter denoted the AB model, consists of chains of two kinds of "amino acids" interacting with Lennard-Jones potentials. We have examined the behavior of 300 randomly selected chains of length 20 in this model (9). Of these, only 10% were found to have reasonable folding properties. Analyzing these sequences with the same methods as being used for the functional proteins, we obtain results that are qualitatively very similar to those for proteins with a typical fraction of hydrophobic residues. In particular, we again find deviations from random behavior that correspond to anticorrelations. One should keep in mind that the toy model chains are quite short and highly simplified as compared with functional proteins. Nevertheless, it is appealing to attempt an ...