Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with ''sinks'' that attract many sequences from other folds. The sinks are rich in -sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.protein designability ͉ sequence capacity ͉ structure stability ͉ transitional sequences A s data on protein sequences and their variations become more accessible (following the abundance of large-scale sequencing and gene expression projects), it is clear that protein structures serve as evolutionary templates. Similar protein backbones are used again and again to create proteins with adjusted functions in response to environmental variations or at random. This asymmetric relationship is of considerable interest in the study of protein evolution and design and has received considerable attention. How many sequences fold to a common structure, or equivalently, what is the sequence capacity (or designability) of a known fold? Past theoretical and computational studies primarily are focused on the thermal stability of the proteins. The stability is estimated by an energy calculation of threaded sequences in a known structure. The theory and calculations can be divided (roughly) into two categories: (i) general theories (1-6) and exhaustive simulations of simple model systems (7-11) and (ii) accurate and detailed modeling of a few proteins (12-16). The studies of class i provide a universal view of sequence-structure matches and their variations. Investigations of class ii made specific predictions on protein folds that are straightforward to test experimentally. The function of interest, protein designability or sequence capacity, was estimated theoretically and by computations. However, neither of these calculations consider explicitly all structures of the Protein Data Bank (PDB) (17). Quantitative extrapolations from approximate theories, lattice models, or detailed simulations of a few proteins to other folds may not be obvious. Furthermore, collective behavior of the evolutionary process, not restricted to a single or a few proteins, may go unnoticed.Explicit calculation of sequence capacity of all protein folds is of particular interest because genomic-scale experiments are emerging, making...