Annette Höglund scite author profile

Motivation: Knowing the localization of a protein within the cell helps elucidate its role in biological processes, its function and its potential as a drug target. Thus, subcellular localization prediction is an active research area. Numerous localization prediction systems are described in the literature; some focus on specific localizations or organisms, while others attempt to cover a wide range of localizations. Results: We introduce SherLoc, a new comprehensive system for predicting the localization of eukaryotic proteins. It integrates several types of sequence and text-based features. While applying the widely used support vector machines (SVMs), SherLoc's main novelty lies in the way in which it selects its text sources and features, and integrates those with sequence-based features. We test SherLoc on previously used datasets, as well as on a new set devised specifically to test its predictive power, and show that SherLoc consistently improves on previous reported results. We also report the results of applying SherLoc to a large set of yetunlocalized proteins.

show abstract

Predicting Protein Subcellular Localization: Past, Present, and Future

Dönnes

Höglund

2004

View full text Add to dashboard Cite

Functional characterization of every single protein is a major challenge of the post-genomic era. The large-scale analysis of a cell’s proteins, proteomics, seeks to provide these proteins with reliable annotations regarding their interaction partners and functions in the cellular machinery. An important step on this way is to determine the subcellular localization of each protein. Eukaryotic cells are divided into subcellular compartments, or organelles. Transport across the membrane into the organelles is a highly regulated and complex cellular process. Predicting the subcellular localization by computational means has been an area of vivid activity during recent years. The publicly available prediction methods differ mainly in four aspects: the underlying biological motivation, the computational method used, localization coverage, and reliability, which are of importance to the user. This review provides a short description of the main events in the protein sorting process and an overview of the most commonly used methods in this field.

show abstract

Prediction of dual protein targeting to plant organelles

et al. 2009

View full text Add to dashboard Cite

Summary Dual targeting of proteins to more than one subcellular localization has been found in animals, in fungi and in plants. In the latter, ambiguous N‐terminal targeting signals have been described that result in the protein being located in both mitochondria and plastids. We have developed ambiguous targeting predictor (ATP), a machine‐learning implementation that classifies such ambiguous targeting signals. Ambiguous targeting predictor is based on a support vector machine implementation that makes use of 12 different amino acid features. Prediction results were validated using fluorescent protein fusion. Both in silico and in vivo evaluations demonstrate that ambiguous targeting predictor is useful for predicting dual targeting to mitochondria and plastids. Proteins that are targeted to both organelles by tandemly arrayed signals (so‐called twin targeting) can be predicted by both ambiguous targeting predictor and a combination of single targeting prediction tools. Comparison of ambiguous targeting predictor with previous experimental approaches, as well as in silico approaches, shows good congruence. Based on the prediction results, land plant genomes are expected to encode, on average, > 400 proteins that are located in mitochondria and plastids. Ambiguous targeting predictor is helpful for functional genome annotation and can be used as a tool to further our understanding about dual protein targeting and its evolution.

show abstract

Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes

Sandelin

Höglund

Lenhard

et al. 2003

Functional & Integrative Genomics

View full text Add to dashboard Cite

Dramatic progress in deciphering the regulatory controls in Saccharomyces cerevisiae has been enabled by the fusion of high-throughput genomics technologies with advanced sequence analysis algorithms. Sets of genes likely to function together and with similar expression profiles have been identified in diverse studies. By fusing an advanced pattern recognition algorithm for identification of transcription factor binding sites with a new method for the quantitative comparison of binding properties of transcription factors, we provide an integrated means to move from expression data to biological insights. The Yeast Regulatory Sequence Analysis system, YRSA, combines standard functions with a novel pattern characterization procedure in an intuitive interface designed for use by a broad range of scientists. The features of the system include automated retrieval of user-defined promoter sequences, binding site discovery by pattern recognition, graphical displays of the observed pattern and positions of similar sequences in the specified genes, and comparison of the new pattern against a collection of binding patterns for characterized transcription factors. The comprehensive YRSA system was used to study the regulatory mechanisms of yeast regulons. Analysis of the regulatory controls of a battery of genes induced by DNA damaging agents supports a putative mediating role for the cell-cycle checkpoint regulatory element MCB. YRSA is available at http://yrsa.cgb.ki.se. [YRSA: ancient Scandinavian name meaning old she-bear (Latin Ursus arctos = brown bear/grizzly).]

show abstract

Significantly Improved Prediction of Subcellular Localization by Integrating Text and Protein Sequence Data

Höglund

Blum

Brady

et al. 2005

View full text Add to dashboard Cite

Untitled

Höglund

Kohlbacher

2004

Proteome Sci

View full text Add to dashboard Cite

Gene regulation in higher organisms is achieved by a complex network of transcription factors (TFs). Modulating gene expression and exploring gene function are major aims in molecular biology. Furthermore, the identification of putative target genes for a certain TF serve as powerful tools for specific targeting of rational drugs.Detecting the short and variable transcription factor binding sites (TFBSs) in genomic DNA is an intriguing challenge for computational and structural biologists. Fast and reliable computational methods for predicting TFBSs on a whole-genome scale offer several advantages compared to the current experimental methods that are rather laborious and slow. Two main approaches are being explored, advanced sequence-based algorithms and structure-based methods.The aim of this review is to outline the computational and experimental methods currently being applied in the field of protein-DNA interactions. With a focus on the former, the current state of the art in modeling these interactions is discussed. Surveying sequence and structure-based methods for predicting TFBSs, we conclude that in order to achieve a sound and specific method applicable on genomic sequences it is desirable and important to bring these two approaches together.

show abstract

Integrative analysis of cancer‐related data using CAP

et al. 2004

View full text Add to dashboard Cite

The development of human cancer is a highly complex process and can be considered the result of several combined events, such as genetic alterations, disturbance of signal transduction, or failure of immunological surveillance. Cancer-related databases usually focus on specific fields of research, e.g., cancer genetics or cancer immunology, whereas the complexity of cancer genesis requires an integrated analysis of heterogeneous data from several sources. Here we present the cancer-associated protein database (CAP), a novel analysis system for cancer-related data. CAP integrates data from multiple external databases, augments these data with functional annotations, and offers tools for statistical analysis of these data. We have employed CAP to analyze genes that have been found to cause an autoimmune response in cancer. In particular, we explored the connection between the autoimmune response, mutations, and overexpression of these genes. Our preliminary results suggest that mutations are not significant contributors to raising an antibody response against tumor antigens, whereas overexpression seems to play a more important role. We hereby demonstrate how different types of data can be integrated and analyzed successfully, providing interesting results. As the amount of available data is growing rapidly, a combined analysis will play an important role in exploring the genetic and immunological basis of cancer. CAP is freely available at the following web site: http://www.bioinf.uni-sb.de/CAP/.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.