Computational prediction of intrinsic disorder in protein sequences dates back to late 1970 and has flourished in the last two decades. We provide a brief historical overview, and we review over 30 recent predictors of disorder. We are the first to also cover predictors of molecular functions of disorder, including 13 methods that focus on disordered linkers and disordered protein-protein, protein-RNA, and protein-DNA binding regions. We overview their predictive models, usability, and predictive performance. We highlight newest methods and predictors that offer strong predictive performance measured based on recent comparative assessments. We conclude that the modern predictors are relatively accurate, enjoy widespread use, and many of them are fast. Their predictions are conveniently accessible to the end users, via web servers and databases that store pre-computed predictions for millions of proteins. However, research into methods that predict many not yet addressed functions of intrinsic disorder remains an outstanding challenge.
Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder.Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues.Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/.Contact: lkurgan@vcu.eduSupplementary information: Supplementary data are available at Bioinformatics online.
The cell nucleus contains a number of membrane-less organelles or intra-nuclear compartments. These compartments are dynamic structures representing liquid-droplet phases which are only slightly denser than the bulk intra-nuclear fluid. They possess different functions, have diverse morphologies, and are typically composed of RNA (or, in some cases, DNA) and proteins. We analyzed 3005 mouse proteins localized in specific intra-nuclear organelles, such as nucleolus, chromatin, Cajal bodies, nuclear speckles, promyelocytic leukemia (PML) nuclear bodies, nuclear lamina, nuclear pores, and perinuclear compartment and compared them with ~29,863 non-nuclear proteins from mouse proteome. Our analysis revealed that intrinsic disorder is enriched in the majority of intra-nuclear compartments, except for the nuclear pore and lamina. These compartments are depleted in proteins that lack disordered domains and enriched in proteins that have multiple disordered domains. Moonlighting proteins found in multiple intra-nuclear compartments are more likely to have multiple disordered domains. Protein-protein interaction networks in the intra-nuclear compartments are denser and include more hubs compared to the non-nuclear proteins. Hubs in the intra-nuclear compartments (except for the nuclear pore) are enriched in disorder compared with non-nuclear hubs and non-nuclear proteins. Therefore, our work provides support to the idea of the functional importance of intrinsic disorder in the cell nucleus and shows that many proteins associated with sub-nuclear organelles in nuclei of mouse cells are enriched in disorder. This high level of disorder in the mouse nuclear proteins defines their ability to serve as very promiscuous binders, possessing both large quantities of potential disorder-based interaction sites and the ability of a single such site to be involved in a large number of interactions.
In this study, we used a wide spectrum of bioinformatics techniques to evaluate the extent of intrinsic disorder in the complete proteomes of genotypes of four human dengue virus (DENV), to analyze the peculiarities of disorder distribution within individual DENV proteins, and to establish potential roles for the structural disorder with respect to their functions. We show that several proteins (ER, E, 1, 2A and 4A) are predicted to be mostly ordered, whereas four proteins (C, 2k, NS3 and NS5) are expected to have high disorder levels. The profiles of disorder propensities are similar across the four genotypes, except for the NS5 protein. Cleavage sites are depleted in polymorphic sites, and have a high propensity for disorder, especially relative to neighboring residues. Disordered regions are highly polymorphic in type 1 DENV but have a relatively low number of polymorphic sites in the type 4 virus. There is a high density of polymorphisms in proteins 2A and 4A, which are depleted in disorder. Thus, a high density of polymorphism is not unique to disordered regions. Analysis of disorder/ function association showed that the predominant function of the disordered regions in the DENV proteins is protein-protein interaction and binding of nucleic acids, metals and other small molecules. These regions are also associated with phosphorylation, which may regulate their function.Abbreviations CLV, cleavage site; DENV, dengue virus; ELM, eukaryotic linear motif; IDP, intrinsically disordered protein; IDPR, intrinsically disordered protein region; MoRF, molecular recognition feature; NS, non-structural protein. IntroductionDengue fever virus (DENV) is a member of the family Flaviviridae in the genus Flavivirus, that, among other members, includes hepatitis C virus, West Nile virus and yellow fever virus [1]. The Flavivirus genus consists of nearly 80 viruses, many of which are arthropodborne human pathogens that cause a variety of diseases, including dengue fever, plus the associated dengue hemorrhagic fever and dengue shock syndrome, Japanese encephalitis and yellow fever [2]. There are four antigenically related serotypes of the dengue virus (DENV-1, DENV-2, DENV-3 and DENV-4). All four serotypes are known to cause the full spectrum of disease [3]. Infection with one of these serotypes provides immunity for life, but to only that serotype [4]. Therefore, persons living in a dengue endemic area are at risk of encountering secondary infection with other DENV serotypes.DENV is an arbovirus (arthropod-borne virus) that is primarily transmitted between humans and Aedes aegyptim which breed in domestic and peridomestic water containers. A sylvatic cycle (whereby jungle primates and mosquito vectors perpetuate the virus) has been documented in Southeast Asia and West Africa, but it is presently uncertain to what extent this cycle contributes to human infections [5]. It is believed that this virus displays enzootic maintenance cycles that involve Aedes mosquitoes, which breed in tree holes and transmit the virus between mon...
Computational prediction of intrinsically disordered proteins (IDPs) is a mature research field. These methods predict disordered residues and regions in an input protein chain. More than 60 predictors of IDPs have been developed. This unit defines computational prediction of intrinsic disorder, summarizes major types of predictors of disorder, and provides details about three accurate and recently released methods. We demonstrate the use of these methods to predict intrinsic disorder for several illustrative proteins, provide insights into how predictions should be interpreted, and quantify and discuss predictive performance. Predictions can be freely and conveniently obtained using webservers. We point to the availability of databases that provide access to annotations of intrinsic disorder determined by structural studies and putative intrinsic disorder pre-computed by computational methods. Lastly, we also summarize experimental methods that can be used to validate computational predictions. © 2017 by John Wiley & Sons, Inc.
Recent analyses indicated that autophagy can be regulated via some nuclear transcriptional networks and many important players in the autophagy and other forms of programmed cell death are known to be intrinsically disordered. To this end, we analyzed similarities and differences in the intrinsic disorder distribution of nuclear and non-nuclear proteins related to autophagy. We also looked at the peculiarities of the distribution of the intrinsically disordered autophagy-related proteins in various intra-nuclear organelles, such as the nucleolus, chromatin, Cajal bodies, nuclear speckles, promyelocytic leukemia (PML) nuclear bodies, nuclear lamina, nuclear pores, and perinucleolar compartment. This analysis revealed that the autophagy-related proteins constitute about 2.5% of the non-nuclear proteins and 3.3% of the nuclear proteins, which corresponds to a substantial enrichment by about 32% in the nucleus. Curiously, although, in general, the autophagy-related proteins share similar characteristics of disorder with a generic set of all non-nuclear proteins, chromatin and nuclear speckles are enriched in the intrinsically disordered autophagy proteins (29 and 37% of these proteins are disordered, respectively) and have high disorder content at 0.24 and 0.27, respectively. Therefore, our data suggest that some of the nuclear disordered proteins may play important roles in autophagy.
Background: Development of predictors of propensity of protein sequences for successful crystallization has been actively pursued for over a decade. A few novel methods that expanded the scope of these predictions to address additional steps of protein production and structure determination pipelines were released in recent years. The predictive performance of the current methods is modest. This is because the only input that they use is the protein sequence and since the experimental annotations of these data might be inconsistent given that they were collected across many laboratories and centers. However, even these modest levels of predictive quality are still practical compared to the reported low success rates of crystallization, which are below 10%. We focus on another important aspect related to a high computational cost of running the predictors that offer the expanded scope. Results: We introduce a novel fDETECT webserver that provides very fast and modestly accurate predictions of the success of protein production, purification, crystallization, and structure determination. Empirical tests on two datasets demonstrate that fDETECT is more accurate than the only other similarly fast method, and similarly accurate and three orders of magnitude faster than the currently most accurate predictors. Our method predicts a single protein in about 120 milliseconds and needs less than an hour to generate the four predictions for an entire human proteome. Moreover, we empirically show that fDETECT secures similar levels of predictive performance when compared with four representative methods that only predict success of crystallization, while it also provides the other three predictions. A webserver that implements fDETECT is available at http://biomine.cs.vcu.edu/servers/ fDETECT/. Conclusions: fDETECT is a computational tool that supports target selection for protein production and X-ray crystallography-based structure determination. It offers predictive quality that matches or exceeds other state-ofthe-art tools and is especially suitable for the analysis of large protein sets.
Intrinsically disordered regions lack stable structure in their native conformation but are nevertheless functional and highly abundant, particularly in Eukaryotes. Disordered moonlighting regions (DMRs) are intrinsically disordered regions that carry out multiple functions. DMRs are different from moonlighting proteins that could be structured and that are annotated at the whole-protein level. DMRs cannot be identified by current predictors of functions of disorder that focus on specific functions rather than multifunctional regions. We conceptualized, designed and empirically assessed first-of-its-kind sequence-based predictor of DMRs, DMRpred. This computational tool outputs propensity for being in a DMR for each residue in an input protein sequence. We developed novel amino acid indices that quantify propensities for functions relevant to DMRs and used evolutionary conservation, putative solvent accessibility and intrinsic disorder derived from the input sequence to build a rich profile that is suitable to accurately predict DMRs. We processed this profile to derive innovative features that we input into a Random Forest model to generate the predictions. Empirical assessment shows that DMRpred generates accurate predictions with area under receiver operating characteristic curve = 0.86 and accuracy = 82%. These results are significantly better than the closest alternative approaches that rely on sequence alignment, evolutionary conservation and putative disorder and disorder functions. Analysis of abundance of putative DMRs in the human proteome reveals that as many as 25% of proteins may have long >30 residues) DMRs. A webserver implementation of DMRpred is available at http://biomine.cs.vcu.edu/servers/DMRpred/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.