We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
We present a neural network based method~ChloroP! for identifying chloroplast transit peptides and their cleavage sites. Using cross-validation, 88% of the sequences in our homology reduced training set were correctly classified as transit peptides or nontransit peptides. This performance level is well above that of the publicly available chloroplast localization predictor PSORT. Cleavage sites are predicted using a scoring matrix derived by an automatic motif-finding algorithm. Approximately 60% of the known cleavage sites in our sequence collection were predicted to within 62 residues from the cleavage sites given in SWISS-PROT. An analysis of 715 Arabidopsis thaliana sequences from SWISS-PROT suggests that the ChloroP method should be useful for the identification of putative transit peptides in genome-wide sequence data. The ChloroP predictor is available as a web-server at http:00www.cbs.dtu.dk0services0 ChloroP0.
We have carried out detailed statistical analyses of integral membrane proteins of the helix-bundle class from eubacterial, archaean, and eukaryotic organisms for which genome-wide sequence data are available. Twenty to 30% of all ORFs are predicted to encode membrane proteins, with the larger genomes containing a higher fraction than the smaller ones. Although there is a general tendency that proteins with a smaller number of transmembrane segments are more prevalent than those with many, uni-cellular organisms appear to prefer proteins with 6 and 12 transmembrane segments, whereas Caenorhabditis elegans and Homo sapiens have a slight preference for proteins with seven transmembrane segments. In all organisms, there is a tendency that membrane proteins either have many transmembrane segments with short connecting loops or few transmembrane segments with large extra-membraneous domains. Membrane proteins from all organisms studied, except possibly the archaeon Methanococcus jannaschii, follow the so-called "positiveinside" rule; Le., they tend to have a higher frequency of positively charged residues in cytoplasmic than in extracytoplasmic segments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.