Protein intrinsic disorder is becoming increasingly recognized in proteomics research. While lacking structure, many regions of disorder have been associated with biological function. There are many different experimental methods for characterizing intrinsically disordered proteins and regions; nevertheless, the prediction of intrinsic disorder from amino acid sequence remains a useful strategy especially for many large-scale proteomics investigations. Here we introduced a consensus artificial neural network (ANN) prediction method, which was developed by combining the outputs of several individual disorder predictors. By eight-fold cross-validation, this meta-predictor, called PONDR-FIT, was found to improve the prediction accuracy over a range of 3 to 20% with an average of 11% compared to the single predictors, depending on the datasets being used. Analysis of the errors shows that the worst accuracy still occurs for short disordered regions with less than ten residues, as well as for the residues close to order/disorder boundaries. Increased understanding of the underlying mechanism by which such meta-predictors give improved predictions will likely promote the further development of protein disorder predictors. The access to PONDR-FIT is available at www.disprot.org.
We present the Database of Disordered Protein Prediction (D2P2), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D2P2 will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life.
Intrinsically disordered proteins and intrinsically disordered protein regions are highly abundant in nature. However, the quantitative and qualitative measures of protein intrinsic disorder in species with known genomes are still not available. Furthermore, although the correlation between high fraction of disordered residues and advanced species has been reported, the details of this correlation and the connection between the disorder content and proteome complexity have not been reported as of yet. To fill this gap, we analysed entire proteomes of 3484 species from three domains of life (archaea, bacteria and eukaryotes) and from viruses. Our analysis revealed that the evolution process is characterized by distinctive patterns of changes in the protein intrinsic disorder content. We are showing here that viruses are characterized by the widest spread of the proteome disorder content (the percentage of disordered residues ranges from 7.3% in human coronavirus NL63 to 77.3% in Avian carcinoma virus). For several organisms, a clear correlation is seen between their disorder contents and habitats. In multicellular eukaryotes, there is a weak correlation between the complexity of an organism (evaluated as a number of different cell types) and its overall disorder content. For both the prokaryotes and eukaryotes, the disorder content is generally independent of the proteome size. However, disorder shows a sharp increase associated with the transition from prokaryotic to eukaryotic cells. This suggests that the increased disorder content in eukaryotic proteomes might be used by nature to deal with the increased cell complexity due to the appearance of the various cellular compartments.
Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains.Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues.Availability: http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdfContact: lkurgan@ece.ualberta.caSupplementary information: Supplementary data are available at Bioinformatics online.
The discovery of intrinsically disordered proteins (IDP) (i.e., biologically active proteins that do not possess stable secondary and/or tertiary structures) came as an unexpected surprise, as the existence of such proteins is in contradiction to the traditional "sequence→structure→function" paradigm. Accurate prediction of a protein's predisposition to be intrinsically disordered is a necessary prerequisite for the further understanding of principles and mechanisms of protein folding and function, and is a key for the elaboration of a new structural and functional hierarchy of proteins. Therefore, prediction of IDPs has attracted the attention of many researchers, and a number of prediction tools have been developed. Predictions of disorder, in turn, are playing major roles in directing laboratory experiments that are leading to the discovery of ever more disordered proteins, and thereby leading to a positive feedback loop in the investigation of these proteins. In this review of algorithms for intrinsic disorder prediction, the basic concepts of various prediction methods for IDPs are summarized, the strengths and shortcomings of many of the methods are analyzed, and the difficulties and directions of future development of IDP prediction techniques are discussed.
consensus-based disorder predictions, and for the first time comprehensively characterized intrinsic disorder at proteomic and protein levels from all significant perspectives, including abundance, cellular localization, functional roles, evolution, and impact on structural coverage. We show that intrinsic disorder is more abundant and has a unique profile in eukaryotes. We map disorder into archaea, bacterial and eukaryotic cells, and demonstrate that it is preferentially located in some cellular compartments. Functional analysis that considers over 1,200 annotations shows that certain functions are exclusively implemented by intrinsically disordered proteins and regions, and that some of them are specific to certain domains of life. We reveal that disordered regions are often targets for various post-translational modifications, but primarily in the eukaryotes and viruses. Using a phylogenetic tree for 14 eukaryotic and 112 bacterial species, we analyzed relations between disorder, sequence conservation and evolutionary speed. We provide a complete analysis that clearly shows that intrinsic disorder is exceptionally and uniquely abundant in each domain of life. Keywords Intrinsic disorder · Intrinsically disordered proteins · Intrinsically disordered regions · Cellular localization · Post-translational modifications · Evolutionary speed IntroductionIt is now recognized that in addition to globular, transmembrane and fibrillar proteins that are known to be characterized by unique three dimensional (3D)-structure, there is another tribe of proteins, which, being biologically functional, do not have unique 3D-structures in their native Abstract Recent years witnessed increased interest in intrinsically disordered proteins and regions. These proteins and regions are abundant and possess unique structural features and a broad functional repertoire that complements ordered proteins. However, modern studies on the abundance and functions of intrinsically disordered proteins and regions are relatively limited in size and scope of their analysis. To fill this gap, we performed a broad and detailed computational analysis of over 6 million proteins from 59 archaea, 471 bacterial, 110 eukaryotic and 325 viral proteomes. We used arguably more accurate Electronic supplementary material The online version of this article (doi:10.1007/s00018-014-1661-9) contains supplementary material, which is available to authorized users. 3states under the physiologic conditions in vitro and in vivo [1][2][3][4][5]. The members of this novel tribe are known as intrinsically disordered proteins (IDPs). Their structures are defined as highly dynamic ensembles of flexible conformations, where sampling of a large portion of a polypeptide's available conformational space is allowed. Although IDPs and intrinsically disordered regions (IDRs) in proteins are devoid of stable 3D-structures, they possess crucial biological functions and play multiple important roles in living organisms. In fact, the conformational plasticity associated with intrins...
Background: Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) lack stable tertiary and/or secondary structure yet fulfills key biological functions. The recent recognition of IDPs and IDRs is leading to an entire field aimed at their systematic structural characterization and at determination of their mechanisms of action. Bioinformatics studies showed that IDPs and IDRs are highly abundant in different proteomes and carry out mostly regulatory functions related to molecular recognition and signal transduction. These activities complement the functions of structured proteins. IDPs and IDRs were shown to participate in both one-to-many and many-to-one signaling. Alternative splicing and posttranslational modifications are frequently used to tune the IDP functionality. Several individual IDPs were shown to be associated with human diseases, such as cancer, cardiovascular disease, amyloidoses, diabetes, neurodegenerative diseases, and others. This raises questions regarding the involvement of IDPs and IDRs in various diseases.
The intrinsic disorder is highly abundant in eukaryotic genomes. In the animal kingdom, numerous intrinsically disordered proteins (IDPs) have been characterized, especially in cell signalling and transcription regulation. An intrinsically disordered region often folds in different structures allowing an IDP to recognize and bind different partners at various binding interfaces. In contrast, there have only been a few reports of IDPs from the plant kingdom. Plant-specific GRAS proteins play critical and diverse roles in plant development and signalling and often act as integrators of signals from multiple plant growth regulatory inputs. Using computational and bioinformatics tools, we demonstrate here that the GRAS proteins are intrinsically disordered, thus forming the first functionally required unfoldome in the plant kingdom. Furthermore, the N-terminal domains of GRAS proteins are predicted to contain numerous Molecular Recognition Features (MoRFs), short interaction-prone segments that are located within extended disorder regions and are able to recognize their interacting partners and to undergo disorder-to-order transitions upon binding to these specific partners. Overlapping with the relatively conserved motifs in the N-terminal domains of GRAS proteins, these predicted MoRFs represent the potential protein-protein binding sites and may be involved in molecular recognition during plant development. This study enables us to propose a conceptual framework that guides future experimental approaches to understand structure-function relationships of the entire GRAS family.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.