Abstract:Motivation
Eukaryotic cells contain different membrane-delimited compartments, which are crucial for the biochemical reactions necessary to sustain cell life. Recent studies showed that cells can also trigger the formation of membraneless organelles composed by phase-separated proteins to respond to various stimuli. These condensates provide new ways to control the reactions and phase-separation proteins (PSPs) are thus revolutionizing how cellular organization is conceived. The small number … Show more
“…We also calculated the number of overlapped PSPs predicted between any two tools and the overlapping matrix is shown in Figure 3B. For all prediction tools, the proportions of predicted PSPs ranked in the order of tier 1 > tier 2 > tier 3 > tier 4, which is in accordance with the degree of These results re-emphasize that there are only a small proportion of proteins spontaneously involved in the formation of condensates [20,26], even for the scaffolds, a large proportion of which might participate in multi-component LLPS environments.…”
Section: Analysis Of Scaffolds Regulators Clients and Granule Formisupporting
confidence: 62%
“…We also tested two first generation PSP prediction tools, PScore and CatGranule, which performed best among the 7 first-generation methods [13] and PSPer [20], on dataset F1+. The relationships between percent recall and total percentage of whole proteins accepted at given thresholds, for PScore, CatGranule, PSPer and our Model 1, are shown in Figure 1.…”
Section: Development Of the Psp Prediction Tool -Pspredictormentioning
confidence: 99%
“…Each of them is based on specific protein features which are deemed as the driving force of LLPS. Specifically, PScore is based on the expected number of long-range, planar sp2 pi-pi contacts [14]; DDX4-like is based on sequence composition and residue spacing similarity to DDX4 [15]; PLAAC is based on prionlike domains [16]; LARKS is based on low-complexity aromatic-rich kinked segments [17]; R+Y is based on the proportion of arginine and tyrosine and the features of FET family proteins [18]; and CatGranule is based on statistical analysis of amino acids composition responsible for granule forming [19] Recently, Orlando et al [20]. Their tool (PSPer) has been successfully used to predict the phase separation ability for 22 experimentally studied FUS-LIKE proteins [18].…”
The liquid-liquid phase separation (LLPS) of bio-molecules in cell underpins the formation of membraneless organelles, which are the condensates of protein, nucleic acid, or both, and play critical roles in cellular functions. The dysregulation of LLPS might be implicated in a number of diseases. Although the LLPS of biomolecules has been investigated intensively in recent years, the knowledge of the prevalence and distribution of phase separation proteins (PSPs) is still lag behind. Development of computational methods to predict PSPs is therefore of great importance for comprehensive understanding of the biological function of LLPS. Here, a sequencebased prediction tool using machine learning for LLPS proteins (PSPredictor) was developed. Our model can achieve a maximum 10-CV accuracy of 96.03%, and performs much better in identifying new PSPs than reported PSP prediction tools. As far as we know, this is the first attempt to make a direct and more general prediction on LLPS proteins only based on sequence information.
“…We also calculated the number of overlapped PSPs predicted between any two tools and the overlapping matrix is shown in Figure 3B. For all prediction tools, the proportions of predicted PSPs ranked in the order of tier 1 > tier 2 > tier 3 > tier 4, which is in accordance with the degree of These results re-emphasize that there are only a small proportion of proteins spontaneously involved in the formation of condensates [20,26], even for the scaffolds, a large proportion of which might participate in multi-component LLPS environments.…”
Section: Analysis Of Scaffolds Regulators Clients and Granule Formisupporting
confidence: 62%
“…We also tested two first generation PSP prediction tools, PScore and CatGranule, which performed best among the 7 first-generation methods [13] and PSPer [20], on dataset F1+. The relationships between percent recall and total percentage of whole proteins accepted at given thresholds, for PScore, CatGranule, PSPer and our Model 1, are shown in Figure 1.…”
Section: Development Of the Psp Prediction Tool -Pspredictormentioning
confidence: 99%
“…Each of them is based on specific protein features which are deemed as the driving force of LLPS. Specifically, PScore is based on the expected number of long-range, planar sp2 pi-pi contacts [14]; DDX4-like is based on sequence composition and residue spacing similarity to DDX4 [15]; PLAAC is based on prionlike domains [16]; LARKS is based on low-complexity aromatic-rich kinked segments [17]; R+Y is based on the proportion of arginine and tyrosine and the features of FET family proteins [18]; and CatGranule is based on statistical analysis of amino acids composition responsible for granule forming [19] Recently, Orlando et al [20]. Their tool (PSPer) has been successfully used to predict the phase separation ability for 22 experimentally studied FUS-LIKE proteins [18].…”
The liquid-liquid phase separation (LLPS) of bio-molecules in cell underpins the formation of membraneless organelles, which are the condensates of protein, nucleic acid, or both, and play critical roles in cellular functions. The dysregulation of LLPS might be implicated in a number of diseases. Although the LLPS of biomolecules has been investigated intensively in recent years, the knowledge of the prevalence and distribution of phase separation proteins (PSPs) is still lag behind. Development of computational methods to predict PSPs is therefore of great importance for comprehensive understanding of the biological function of LLPS. Here, a sequencebased prediction tool using machine learning for LLPS proteins (PSPredictor) was developed. Our model can achieve a maximum 10-CV accuracy of 96.03%, and performs much better in identifying new PSPs than reported PSP prediction tools. As far as we know, this is the first attempt to make a direct and more general prediction on LLPS proteins only based on sequence information.
“…Recently, a predictor of LLPS protein (PSPredictor, ) based on machine learning was developed [ 68 ], using the datasets in LLPSDB as a training set. It achieved a fairly high prediction accuracy and outperformed other reported prediction tools so far, which are all based on specific protein sequence features [ 61 , 67 , 69 ]. The well-summarized structural, functional, and detailed experimental information provided in PhaSePro makes it very useful for researchers to find complete and systematic knowledge of LLPS proteins.…”
Liquid−liquid phase separation (LLPS) of biomolecules, which underlies the formation of membraneless organelles (MLOs) or biomolecular condensates, has been investigated intensively in recent years. It contributes to the regulation of various physiological processes and related disease development. A rapidly increasing number of studies have recently focused on the biological functions, driving, and regulating mechanisms of LLPS in cells. Based on the mounting data generated in the investigations, six databases (LLPSDB, PhaSePro, PhaSepDB, DrLLPS, RNAgranuleDB, HUMAN CELL MAP) have been developed, which are designed directly based on LLPS studies or the component identification of MLOs. These resources are invaluable for a deeper understanding of the cellular function of biomolecular phase separation, as well as the development of phase-separating protein prediction and design. In this review, we compare the data contents, annotations, and organization of these databases, highlight their unique features, overlaps, and fundamental differences, and discuss their suitable applications.
“…In the case of disorder prediction, such emergent properties can capture the underlying reasons of why a protein region tends to be disordered, and so help to make the disorder predictions more generally applicable. DisoMine has also already been successfully used in a pipeline for the identification of prion-like RNA-binding proteins that form liquid phase-separated condensates [15], defining the disorder content of the regions that typically constitute this class of proteins.…”
The role of intrinsically disordered protein regions (IDRs) in cellular processes has become increasingly evident over the last years. These IDRs continue to challenge structural biology experiments because they lack a well-defined conformation, and bioinformatics approaches that accurately delineate disordered protein regions remain essential for their identification and further investigation. Typically, these predictors use only the protein amino acid sequence, without taking into account likely emergent properties that are sequence context dependent, such as protein backbone dynamics.The DisoMine method predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding. The tool is fast and requires only a single sequence, making it applicable for large-scale screening, including poorly studied and orphan proteins. DisoMine compares well to 10 state of the art predictors, also if these use evolutionary information.DisoMine is freely available through an interactive webserver at http://bio2byte.com/disomine/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.