1996
DOI: 10.1016/s0006-3495(96)79210-x
|View full text |Cite
|
Sign up to set email alerts
|

The Shannon information entropy of protein sequences

Abstract: A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
136
0

Year Published

2001
2001
2016
2016

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 162 publications
(141 citation statements)
references
References 15 publications
3
136
0
Order By: Relevance
“…These conclusions suggest that aggregation constraints may contribute to the observation of Strait and Dewey (1996) that actual protein sequences carry considerably less information than a 20-letter alphabet theoretically allows. They further suggest the importance of considering propensity to aggregate as a design constraint in protein evolution on a par with rapid folding and with stability and functional fitness of the native state.…”
Section: Resultsmentioning
confidence: 97%
See 1 more Smart Citation
“…These conclusions suggest that aggregation constraints may contribute to the observation of Strait and Dewey (1996) that actual protein sequences carry considerably less information than a 20-letter alphabet theoretically allows. They further suggest the importance of considering propensity to aggregate as a design constraint in protein evolution on a par with rapid folding and with stability and functional fitness of the native state.…”
Section: Resultsmentioning
confidence: 97%
“…The overall statistical content of protein sequences has since been examined by Strait and Dewey (1996), who analyzed a protein sequence database in terms of its information entropy by different measures of information content. They determined that actual protein sequences carry significantly less information than is theoretically possible from a 20-letter alphabet.…”
mentioning
confidence: 99%
“…Information measures, such as entropy, have been used in recognition of DNA patterns, classification of genetic sequences, and other computational studies of genetic processes (Roman-Roldan et al, 1996;Palaniappan and Jernigan, 1984;Almagor, 1985;Schneider, 1991b,a;Altschul, 1991;Salamon and Konopka, 1992;Oliver et al, 1993;DeLaVega et al, 1996;Schneider and Mastronarde, 1996;Strait and Dewey, 1996;Pavesi et al, 1997;Loewenstern and Yianilos, 1997;Schneider, 1997Schneider, , 1999. Applying techniques from Coding Theory, a subfield of Information Theory, is a logical next step in the study of the information processing mechanisms of genetic systems.…”
Section: Introductionmentioning
confidence: 99%
“…Net average information content H(P) of a DNA sequence consisting of N base-pairs is formally given by the following formula [22,23]:…”
Section: Inhomogeneity In Genetic Informationmentioning
confidence: 99%