Are biological networks different from other large complex networks? Both large biological and nonbiological networks exhibit power-law graphs (number of nodes with degree k, N(k) approximately k(-beta)), yet the exponents, beta, fall into different ranges. This may be because duplication of the information in the genome is a dominant evolutionary force in shaping biological networks (like gene regulatory networks and protein-protein interaction networks) and is fundamentally different from the mechanisms thought to dominate the growth of most nonbiological networks (such as the Internet). The preferential choice models used for nonbiological networks like web graphs can only produce power-law graphs with exponents greater than 2. We use combinatorial probabilistic methods to examine the evolution of graphs by node duplication processes and derive exact analytical relationships between the exponent of the power law and the parameters of the model. Both full duplication of nodes (with all their connections) as well as partial duplication (with only some connections) are analyzed. We demonstrate that partial duplication can produce power-law graphs with exponents less than 2, consistent with current data on biological networks. The power-law exponent for large graphs depends only on the growth process, not on the starting graph.
A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed.
Motivation: There has been considerable interest in developing computational techniques for inferring genetic regulatory networks from whole-genome expression profiles. When expression time series data sets are available, dynamic models can, in principle, be used to infer correlative relationships between gene expression levels, which may be causal. However, because of the range of detectable expression levels and the current quality of the data, the predictive nature of such inferred, quantitative models is questionable. Network models derived from simple rate laws offer an intermediate level analysis, going beyond simple statistical analysis, but falling short of a fully quantitative description. This work shows how such network models can be constructed and describes the global properties of the networks derived from such a model. These global properties are statistically robust and provide insights into the design of the underlying network. Results: Several whole-genome expression time series data sets from yeast microarray experiments were analyzed using a Markov-modeling method (Dewey and Galas, Func. Integr. Genomics, 1, 269–278, 2001) to infer an approximation to the underlying genetic network. We found that the global statistical properties of all the resulting networks are similar. The overall structure of these biological networks is distinctly different from that of other recently studied networks such as the Internet or social networks. These biological networks show hierarchical, hub-like structures that have some properties similar to a class of graphs known as small world graphs. Small world networks exhibit local cliquishness while exhibiting strong global connectivity. In addition to the small world properties, the biological networks show a power law or scale free distribution of connectivities. An inverse power law, N(k)∼k-3/2, for the number of vertices (genes) with k connections was observed for three different data sets from yeast. We propose network growth models based on gene duplication events. Simulations of these models yield networks with the same combination of global graphical properties that we inferred from the expression data. Contact: Ashish_Bhan@kgi.eduDavid_Galas@kgi.eduGreg_Dewey@kgi.edu Supplementary Information: http://www.kgi.edu/html/noncore/faculty/dewey/bioinf.pdf * To whom correspondence should be addressed.
A general method for estimating fluorescence resonance energy transfer between distributions of donors and acceptors on surfaces is presented. Continued fraction approximants are obtained from equivalent power series expansions of the change in quantum yield in terms of the fluorescent lifetimes or the steady-state fluorescence. These approximants provide analytic equations for the analysis of energy transfer and error bounds for the approximants. Specific approximants are derived for five models of interest for membrane biochemistry: (a) an infinite plane, (b) parallel infinite planes, (c) the surface of a sphere, (d) the surfaces of concentric spheres, and (e) the surfaces of two separated spheres. Recent experimental results in the literature are analyzed with the equations obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.