Are there dominant membrane protein families with a given number of helices?

Arkin, Isaiah T.; Brünger, A.T.; Engelman, Donald M.

doi:10.1002/(sici)1097-0134(199708)28:4<465::aid-prot1>3.0.co;2-9

Cited by 63 publications

(52 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparable results have been found by many investigators in whole-genome surveys of yeast and other completely sequenced genomes (Arkin et al, 1997;Boyd et al, 1998;Gerstein, 1997Gerstein, , 1998bGerstein & Hegyi, 1998;Goffeau et al, 1993;Jones, 1998;Rost, 1996;Rost et al, 1995;Tomb et al, 1997;Wallin & von Heijne, 1998). In particular, our membrane-prediction program, which predicted whether a protein had transmembrane helices, indicated that 22% of the proteins in the yeast genome were integral membrane (T) proteins (those with more than one transmembrane helix).…”

Section: Figure 4 -Priorscontrasting

confidence: 99%

A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome 1 1Edited by F. Cohen

Drawid¹,

Gerstein

2000

Journal of Molecular Biology

135

View full text Add to dashboard Cite

We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment based on a diverse range of 30 features. These range from specific motifs (e.g. signal sequences or HDEL) to overall properties of a sequence (e.g. surface composition or isoelectric point) to whole-genome data (e.g. absolute mRNA expression levels or their fluctuations). The strength of our approach is the easy integration of many features, particularly the whole-genome expression data. We construct a training and testing set of ~1300 yeast proteins with an experimentally known localization from merging, filtering, and standardizing the annotation in the MIPS, Swiss-Prot and YPD databases, and we achieve 75% accuracy on individual protein predictions using this dataset. Moreover, we are able to estimate the relative protein population of the various compartments without requiring a definite localization for every protein. This approach, which is based on an analogy to formalism in quantum mechanics, gives greater accuracy in determining relative compartment populations than that obtained by simply tallying the localization predictions for individual proteins (on the yeast proteins with known localization, 92% vs. 74%). Our training and testing also highlights which of the 30 features are informative and which are redundant (19 being particularly useful). After developing our system, we apply it to the 4700 yeast proteins with currently unknown localization and estimate the relative population of the various compartments in the entire yeast genome. An unbiased prior is essential to this extrapolated estimate; for this, we use the MIPS localization catalogue, and adapt recent results on the localization of yeast proteins obtained by Snyder and colleagues using a minitransposon system. Our final localizations for all ~6000 proteins in the yeast genome are available over the web at

show abstract

Section: Figure 4 -Priorscontrasting

confidence: 99%

A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome 1 1Edited by F. Cohen

Drawid¹,

Gerstein

2000

Journal of Molecular Biology

135

View full text Add to dashboard Cite

show abstract

“…There have also been many surveys of the occurrence of membrane proteins in genomes [24,39,149,164,182,[186][187][188][189][190]191 ]. The overall number of membrane proteins found depends somewhat on the prediction method and threshold used.…”

Section: Prediction For Characterizing Sequences Without a Structuralmentioning

confidence: 92%

“…Most of the membrane-protein surveys agree on this absence of 7-TM proteins in microbial genomes; some also claim to find more 6 and 12 TM proteins in bacterial genomes corresponding to well known families of transporter proteins [24,187,189,191]. In contrast, surveys of the incomplete (and highly biased) set of human sequences and the unfinished worm genome find a relative abundance of 7-TM proteins in these multi-cellular organisms [187,191].…”

Section: Prediction For Characterizing Sequences Without a Structuralmentioning

confidence: 99%

See 1 more Smart Citation

Comparing genomes in terms of protein structure: surveys of a finite parts list

Gerstein¹

1998

FEMS Microbiology Reviews

View full text Add to dashboard Cite

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. AbstractWe give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g. analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into "fold families." This library can be built up automatically using a structure-comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and "top-10" statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms -i.e. in different kingdoms -have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of "fold-counting" is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have ...

show abstract

“…The importance of membrane proteins is illustrated by the fact that more than 75% of all pharmaceuticals are targeted to one family of membrane proteins: the G protein coupled receptors [2]. Open reading frames encoding proteins with predicted transmembrane Khelices are exceptionally abundant in sequence databases (20^50%) including the Mycoplasma genitalium [3], Haemophilus in£uenza [4], Methanococcus jannaschii [5] and Saccharomyces cerevisiae [6] genomes [7].…”

Section: Introductionmentioning

confidence: 99%