Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Identifying differentially expressed genes is critical in microarray data analysis. Many methods have been developed by combining p-value, fold-change, and various statistical models to determine these genes. When using these methods, it is necessary to set up various pre-determined cutoff values. However, many of these cutoff values are somewhat arbitrary and may not have clear connections to biology. In this study, a genetic distance method based on gene expression level was developed to analyze eight sets of microarray data extracted from the GEO database. Since the genes used in distance calculation have been ranked by fold-change, the genetic distance becomes more stable when adding more genes in the calculation, indicating there is an optimal set of genes which are sufficient to characterize the stable difference between samples. This set of genes is differentially expressed genes representing both the genotypic and phenotypic differences between samples.
The leukemic stem cell marker CD44, has been reported to have prognostic significance in hematological malignancies. The present study therefore aimed to evaluate whether the expression levels of CD44 and the associated pathway components are associated with the survival rate of elderly patients with refractory acute myeloid leukemia (AML). A total of 20 elderly patients diagnosed with refractory AML were divided into two groups, following induction chemotherapy: Complete remission (CR, n=9) and non-remission (NR. n=11). Bone marrow biopsy specimens were collected, expression levels of CD44, phosphatase and tensin homolog (PTEN), mammalian target of rapamycin (mTOR) and nuclear factor-κB (NF-κB) were analyzed by immunohistochemistry and the captured images were analyzed in a blinded manner using Image Pro Plus software, version 6.0. The overall survival rates (OS) of the patients were then analyzed with log rank, and the correlation between CD44, PTEN, mTOR and NF-κB expression levels and patients survival rates were statistically analyzed using Pearson's method. Significant differences were observed between the CR and NR groups for PTEN (P=0.025) and CD44 (P=0.020) expression levels. Positive CD44 expression was significantly correlated with poor overall survival, with a hazard ratio of 6.281 (95% CI, 1.78-22.12; P=0.0042). The mean OS was 4.00 months for patients that demonstrated positive CD44 expression, compared with 9.27 months for patients that demonstrated negative CD44 expression. A tendency towards reduced survival rates was also observed in patients negative for PTEN expression, when compared with that of PTEN-positive patients. The mean OS was 4.81 months in PTEN-negative patients vs. 8.8 months in PTEN-positive patients, with a hazard ratio of 2.689 (95%CI, 0.89-8.08; P=0.078). Patients that exhibited PTEN-positive and CD44-negative expression, survived significantly longer than patients that demonstrated PTEN-negative and CD44-positive expression (mean OS, 9.86 vs 2.67 months; hazard ratio=0.037; 95% CI, 0.006-0.222, P=0.0006). The expression levels of NF-κB and mTOR were slightly increased in the NR group compared with those of the CR group, although no significant differences were identified. PTEN and CD44 expression levels demonstrated trends towards negative correlation. In conclusion, the expression levels of CD44 and PTEN may be useful markers to predict the prognosis of elderly patients with refractory AML.
APETALA2/ETHYLENE RESPONSE FACTOR transcription factors (AP2/ERFs) play crucial roles in adaptation to stresses such as those caused by pathogens, wounding and cold. Although their name suggests a specific role in ethylene signalling, some ERF members also co-ordinate signals regulated by other key plant stress hormones such as jasmonate, abscisic acid and salicylate. We analysed a set of ERF proteins from three divergent plant species for intrinsically disorder regions containing conserved segments involved in protein–protein interaction known as Molecular Recognition Features (MoRFs). Then we correlated the MoRFs identified with a number of known functional features where these could be identified. Our analyses suggest that MoRFs, with plasticity in their disordered surroundings, are highly functional and may have been shuffled between related protein families driven by selection. A particularly important role may be played by the alpha helical component of the structured DNA binding domain to permit specificity. We also present examples of computationally identified MoRFs that have no known function and provide a valuable conceptual framework to link both disordered and ordered structural features within this family to diverse function.
Motivation Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs). Results DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred’s predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods. Availability DisoLipPred’s webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/ Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.