The Shannon information entropy of protein sequences

Strait, B.J.; Dewey, T. Gregory

doi:10.1016/s0006-3495(96)79210-x

Cited by 162 publications

(141 citation statements)

References 15 publications

Supporting

Mentioning

136

Contrasting

Order By: Relevance

“…These conclusions suggest that aggregation constraints may contribute to the observation of Strait and Dewey (1996) that actual protein sequences carry considerably less information than a 20-letter alphabet theoretically allows. They further suggest the importance of considering propensity to aggregate as a design constraint in protein evolution on a par with rapid folding and with stability and functional fitness of the native state.…”

Section: Resultsmentioning

confidence: 97%

“…The overall statistical content of protein sequences has since been examined by Strait and Dewey (1996), who analyzed a protein sequence database in terms of its information entropy by different measures of information content. They determined that actual protein sequences carry significantly less information than is theoretically possible from a 20-letter alphabet.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

Schwartz

Istrail²,

King

2001

Protein Science

View full text Add to dashboard Cite

Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20-22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report here on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence.

show abstract

Section: Resultsmentioning

confidence: 97%

mentioning

confidence: 99%

Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

Schwartz

Istrail²,

King

2001

Protein Science

View full text Add to dashboard Cite

show abstract

“…Information measures, such as entropy, have been used in recognition of DNA patterns, classification of genetic sequences, and other computational studies of genetic processes (Roman-Roldan et al, 1996;Palaniappan and Jernigan, 1984;Almagor, 1985;Schneider, 1991b,a;Altschul, 1991;Salamon and Konopka, 1992;Oliver et al, 1993;DeLaVega et al, 1996;Schneider and Mastronarde, 1996;Strait and Dewey, 1996;Pavesi et al, 1997;Loewenstern and Yianilos, 1997;Schneider, 1997Schneider, , 1999. Applying techniques from Coding Theory, a subfield of Information Theory, is a logical next step in the study of the information processing mechanisms of genetic systems.…”

Section: Introductionmentioning

confidence: 99%

Coding theory based models for protein translation initiation in prokaryotic organisms

et al. 2004

View full text Add to dashboard Cite

Our research explores the feasibility of using communication theory, error control (EC) coding theory specifically, for quantitatively modeling the protein translation initiation mechanism. The messenger RNA (mRNA) of Escherichia coli K-12 is modeled as a noisy (errored), encoded signal and the ribosome as a minimum Hamming distance decoder, where the 16S ribosomal RNA (rRNA) serves as a template for generating a set of valid codewords (the codebook). We tested the E. coli based coding models on 5' untranslated leader sequences of prokaryotic organisms of varying taxonomical relation to E. coli including: Salmonella typhimurium LT2, Bacillus subtilis, and Staphylococcus aureus Mu50. The model identified regions on the 5' untranslated leader where the minimum Hamming distance values of translated mRNA sub-sequences and non-translated genomic sequences differ the most. These regions correspond to the Shine-Dalgarno domain and the non-random domain. Applying the EC coding-based models to B. subtilis, and S. aureus Mu50 yielded results similar to those for E. coli K-12. Contrary to our expectations, the behavior of S. typhimurium LT2, the more taxonomically related to E. coli, resembled that of the non-translated sequence group.

show abstract

“…Net average information content H(P) of a DNA sequence consisting of N base-pairs is formally given by the following formula [22,23]:…”

Section: Inhomogeneity In Genetic Informationmentioning

confidence: 99%

Scientific Élan Vital: Entropy Deficit or Inhomogeneity as a Unified Concept of Driving Forces of Life in Hierarchical Biosphere Driven by Photosynthesis

Sato

2012

Entropy

View full text Add to dashboard Cite

Abstract:Life is considered something different from non-living things, but no single driving force can account for all the different aspects of life, which consists of different levels of hierarchy, such as metabolism, cell physiology, multi-cellular development and organization, population dynamics, ecosystem, and evolution. Although free energy is evidently the driving force in biochemical reactions, there is no established relationship between metabolic energy and spatiotemporal organization of living organisms, or between metabolic energy and genetic information. Since Schrödinger pointed out the importance of exporting entropy in maintaining life, misunderstandings of entropy notion have been obstacles in constructing a unified view on the driving forces of life. Here I present a simplified conceptual framework for unifying driving forces of life at various different levels of hierarchy. The key concept is "entropy deficit", or simply, 'inhomogeneity', which is defined as the difference of maximal possible entropy and actual entropy. This is equivalent to information content in genetic information and protein structure, and is also defined similarly for non-homogeneous structures in ecosystems and evolution. Entropy deficit or inhomogeneoity is a unified measure of all driving forces of life, which could be considered a scientific equivalent to 'élan vital' of Bergson.

show abstract

The Shannon information entropy of protein sequences

Cited by 162 publications

References 15 publications

Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

Coding theory based models for protein translation initiation in prokaryotic organisms

Scientific Élan Vital: Entropy Deficit or Inhomogeneity as a Unified Concept of Driving Forces of Life in Hierarchical Biosphere Driven by Photosynthesis

Contact Info

Product

Resources

About