http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/
Motivation: Knowing the localization of a protein within the cell helps elucidate its role in biological processes, its function and its potential as a drug target. Thus, subcellular localization prediction is an active research area. Numerous localization prediction systems are described in the literature; some focus on specific localizations or organisms, while others attempt to cover a wide range of localizations. Results: We introduce SherLoc, a new comprehensive system for predicting the localization of eukaryotic proteins. It integrates several types of sequence and text-based features. While applying the widely used support vector machines (SVMs), SherLoc's main novelty lies in the way in which it selects its text sources and features, and integrates those with sequence-based features. We test SherLoc on previously used datasets, as well as on a new set devised specifically to test its predictive power, and show that SherLoc consistently improves on previous reported results. We also report the results of applying SherLoc to a large set of yetunlocalized proteins.
Functional characterization of every single protein is a major challenge of the post-genomic era. The large-scale analysis of a cell’s proteins, proteomics, seeks to provide these proteins with reliable annotations regarding their interaction partners and functions in the cellular machinery. An important step on this way is to determine the subcellular localization of each protein. Eukaryotic cells are divided into subcellular compartments, or organelles. Transport across the membrane into the organelles is a highly regulated and complex cellular process. Predicting the subcellular localization by computational means has been an area of vivid activity during recent years. The publicly available prediction methods differ mainly in four aspects: the underlying biological motivation, the computational method used, localization coverage, and reliability, which are of importance to the user. This review provides a short description of the main events in the protein sorting process and an overview of the most commonly used methods in this field.
Summary Dual targeting of proteins to more than one subcellular localization has been found in animals, in fungi and in plants. In the latter, ambiguous N‐terminal targeting signals have been described that result in the protein being located in both mitochondria and plastids. We have developed ambiguous targeting predictor (ATP), a machine‐learning implementation that classifies such ambiguous targeting signals. Ambiguous targeting predictor is based on a support vector machine implementation that makes use of 12 different amino acid features. Prediction results were validated using fluorescent protein fusion. Both in silico and in vivo evaluations demonstrate that ambiguous targeting predictor is useful for predicting dual targeting to mitochondria and plastids. Proteins that are targeted to both organelles by tandemly arrayed signals (so‐called twin targeting) can be predicted by both ambiguous targeting predictor and a combination of single targeting prediction tools. Comparison of ambiguous targeting predictor with previous experimental approaches, as well as in silico approaches, shows good congruence. Based on the prediction results, land plant genomes are expected to encode, on average, > 400 proteins that are located in mitochondria and plastids. Ambiguous targeting predictor is helpful for functional genome annotation and can be used as a tool to further our understanding about dual protein targeting and its evolution.
Dramatic progress in deciphering the regulatory controls in Saccharomyces cerevisiae has been enabled by the fusion of high-throughput genomics technologies with advanced sequence analysis algorithms. Sets of genes likely to function together and with similar expression profiles have been identified in diverse studies. By fusing an advanced pattern recognition algorithm for identification of transcription factor binding sites with a new method for the quantitative comparison of binding properties of transcription factors, we provide an integrated means to move from expression data to biological insights. The Yeast Regulatory Sequence Analysis system, YRSA, combines standard functions with a novel pattern characterization procedure in an intuitive interface designed for use by a broad range of scientists. The features of the system include automated retrieval of user-defined promoter sequences, binding site discovery by pattern recognition, graphical displays of the observed pattern and positions of similar sequences in the specified genes, and comparison of the new pattern against a collection of binding patterns for characterized transcription factors. The comprehensive YRSA system was used to study the regulatory mechanisms of yeast regulons. Analysis of the regulatory controls of a battery of genes induced by DNA damaging agents supports a putative mediating role for the cell-cycle checkpoint regulatory element MCB. YRSA is available at http://yrsa.cgb.ki.se. [YRSA: ancient Scandinavian name meaning old she-bear (Latin Ursus arctos = brown bear/grizzly).]
Gene regulation in higher organisms is achieved by a complex network of transcription factors (TFs). Modulating gene expression and exploring gene function are major aims in molecular biology. Furthermore, the identification of putative target genes for a certain TF serve as powerful tools for specific targeting of rational drugs.Detecting the short and variable transcription factor binding sites (TFBSs) in genomic DNA is an intriguing challenge for computational and structural biologists. Fast and reliable computational methods for predicting TFBSs on a whole-genome scale offer several advantages compared to the current experimental methods that are rather laborious and slow. Two main approaches are being explored, advanced sequence-based algorithms and structure-based methods.The aim of this review is to outline the computational and experimental methods currently being applied in the field of protein-DNA interactions. With a focus on the former, the current state of the art in modeling these interactions is discussed. Surveying sequence and structure-based methods for predicting TFBSs, we conclude that in order to achieve a sound and specific method applicable on genomic sequences it is desirable and important to bring these two approaches together.
The development of human cancer is a highly complex process and can be considered the result of several combined events, such as genetic alterations, disturbance of signal transduction, or failure of immunological surveillance. Cancer-related databases usually focus on specific fields of research, e.g., cancer genetics or cancer immunology, whereas the complexity of cancer genesis requires an integrated analysis of heterogeneous data from several sources. Here we present the cancer-associated protein database (CAP), a novel analysis system for cancer-related data. CAP integrates data from multiple external databases, augments these data with functional annotations, and offers tools for statistical analysis of these data. We have employed CAP to analyze genes that have been found to cause an autoimmune response in cancer. In particular, we explored the connection between the autoimmune response, mutations, and overexpression of these genes. Our preliminary results suggest that mutations are not significant contributors to raising an antibody response against tumor antigens, whereas overexpression seems to play a more important role. We hereby demonstrate how different types of data can be integrated and analyzed successfully, providing interesting results. As the amount of available data is growing rapidly, a combined analysis will play an important role in exploring the genetic and immunological basis of cancer. CAP is freely available at the following web site: http://www.bioinf.uni-sb.de/CAP/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.