Understanding the observed variability in the number of homologs of a gene is a very important unsolved problem that has broad implications for research into coevolution of structure and function, gene duplication, pseudogene formation, and possibly for emerging diseases. Here, we attempt to define and elucidate some possible causes behind the observed irregularity in sequence space. We present evidence that sequence variability and functional diversity of a gene or fold family is influenced by quantifiable characteristics of the protein structure. These characteristics reflect the structural potential for sequence plasticity, i.e., the ability to accept mutation without losing thermodynamic stability. We identify a structural feature of a protein domain-contact density-that serves as a determinant of entropy in sequence space, i.e., the ability of a protein to accept mutations without destroying the fold (also known as fold designability). We show that (log) of average gene family size exhibits statistical correlation (R 2 > 0.9.) with contact density of its three-dimensional structure. We present evidence that the size of individual gene families are influenced not only by the designability of the structure, but also by evolutionary history, e.g., the amount of time the gene family was in existence. We further show that our observed statistical correlation between gene family size and contact density of the structure is valid on many levels of evolutionary divergence, i.e., not only for closely related sequence, but also for less-related fold and superfamily levels of homology.Gene family and domain-fold family sizes are known to vary widely (Finkelstein and Ptitsyn 1987;Finkelstein et al. 1995;Orengo et al. 1999;Teichmann et al. 1999;Yanai et al. 2000;Vitkup et al. 2001;Koonin et al. 2002)-from orphans (families that have only a single member) to considerably populated sets of far-diverged homologs. The observed variability in the number and divergence of gene family members raises many questions, e.g., which genetic mechanisms and evolutionary dynamics could have led to the observed unevenness? Evolutionary biologists have proposed models designed to explain these size distributions (which often follow power laws) (Yanai et al. 2000;Dokholyan et al. 2002;Koonin et al. 2002;Deeds et al. 2003), while assuming no inherent physical differences between gene families from the outset (Huynen and van Nimwegen 1998;Qian et al. 2001;Dokholyan et al. 2002;Koonin et al. 2002). However, many of these models are overly abstract to adequately explain family size distributions in a constructive manner that relate specific features of gene families with their reported size. Neither do these models provide explicit insights into the mechanistic details that might explain observed differences. On the other hand, some researchers have hypothesized that the heterogeneity in family size is due to an underlying distribution of biological or physical properties (Finkelstein et al. 1995;Govindarajan and Goldstein 1996;Li et al. 1996...