Bi Zhao scite author profile

Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/

show abstract

DescribePROT: database of amino acid-level protein structure and function predictions

Zhao

Katuwawala

Oldfield

et al. 2020

View full text Add to dashboard Cite

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

show abstract

How many differentially expressed genes: A perspective from the comparison of genotypic and phenotypic distances

2018

View full text Add to dashboard Cite

Identifying differentially expressed genes is critical in microarray data analysis. Many methods have been developed by combining p-value, fold-change, and various statistical models to determine these genes. When using these methods, it is necessary to set up various pre-determined cutoff values. However, many of these cutoff values are somewhat arbitrary and may not have clear connections to biology. In this study, a genetic distance method based on gene expression level was developed to analyze eight sets of microarray data extracted from the GEO database. Since the genes used in distance calculation have been ranked by fold-change, the genetic distance becomes more stable when adding more genes in the calculation, indicating there is an optimal set of genes which are sufficient to characterize the stable difference between samples. This set of genes is differentially expressed genes representing both the genotypic and phenotypic differences between samples.

show abstract

Deep learning in prediction of intrinsic disorder in proteins

Zhao

Kurgan

2022

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

Prognostic value of the expression of phosphatase and tensin homolog and CD44 in elderly patients with refractory acute myeloid leukemia

Huang

et al. 2015

View full text Add to dashboard Cite

The leukemic stem cell marker CD44, has been reported to have prognostic significance in hematological malignancies. The present study therefore aimed to evaluate whether the expression levels of CD44 and the associated pathway components are associated with the survival rate of elderly patients with refractory acute myeloid leukemia (AML). A total of 20 elderly patients diagnosed with refractory AML were divided into two groups, following induction chemotherapy: Complete remission (CR, n=9) and non-remission (NR. n=11). Bone marrow biopsy specimens were collected, expression levels of CD44, phosphatase and tensin homolog (PTEN), mammalian target of rapamycin (mTOR) and nuclear factor-κB (NF-κB) were analyzed by immunohistochemistry and the captured images were analyzed in a blinded manner using Image Pro Plus software, version 6.0. The overall survival rates (OS) of the patients were then analyzed with log rank, and the correlation between CD44, PTEN, mTOR and NF-κB expression levels and patients survival rates were statistically analyzed using Pearson's method. Significant differences were observed between the CR and NR groups for PTEN (P=0.025) and CD44 (P=0.020) expression levels. Positive CD44 expression was significantly correlated with poor overall survival, with a hazard ratio of 6.281 (95% CI, 1.78-22.12; P=0.0042). The mean OS was 4.00 months for patients that demonstrated positive CD44 expression, compared with 9.27 months for patients that demonstrated negative CD44 expression. A tendency towards reduced survival rates was also observed in patients negative for PTEN expression, when compared with that of PTEN-positive patients. The mean OS was 4.81 months in PTEN-negative patients vs. 8.8 months in PTEN-positive patients, with a hazard ratio of 2.689 (95%CI, 0.89-8.08; P=0.078). Patients that exhibited PTEN-positive and CD44-negative expression, survived significantly longer than patients that demonstrated PTEN-negative and CD44-positive expression (mean OS, 9.86 vs 2.67 months; hazard ratio=0.037; 95% CI, 0.006-0.222, P=0.0006). The expression levels of NF-κB and mTOR were slightly increased in the NR group compared with those of the CR group, although no significant differences were identified. PTEN and CD44 expression levels demonstrated trends towards negative correlation. In conclusion, the expression levels of CD44 and PTEN may be useful markers to predict the prognosis of elderly patients with refractory AML.

show abstract

Computational Disorder Analysis in Ethylene Response Factors Uncovers Binding Motifs Critical to Their Diverse Functions

Sun

Malhis

Zhao

et al. 2019

IJMS

View full text Add to dashboard Cite

APETALA2/ETHYLENE RESPONSE FACTOR transcription factors (AP2/ERFs) play crucial roles in adaptation to stresses such as those caused by pathogens, wounding and cold. Although their name suggests a specific role in ethylene signalling, some ERF members also co-ordinate signals regulated by other key plant stress hormones such as jasmonate, abscisic acid and salicylate. We analysed a set of ERF proteins from three divergent plant species for intrinsically disorder regions containing conserved segments involved in protein–protein interaction known as Molecular Recognition Features (MoRFs). Then we correlated the MoRFs identified with a number of known functional features where these could be identified. Our analyses suggest that MoRFs, with plasticity in their disordered surroundings, are highly functional and may have been shuffled between related protein families driven by selection. A particularly important role may be played by the alpha helical component of the structured DNA binding domain to permit specificity. We also present examples of computationally identified MoRFs that have no known function and provide a valuable conceptual framework to link both disordered and ordered structural features within this family to diverse function.

show abstract

Surveying over 100 predictors of intrinsic disorder in proteins

Zhao

Kurgan

2021

Expert Review of Proteomics

View full text Add to dashboard Cite

DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning

Katuwawala

Zhao

Kurgan

2021

View full text Add to dashboard Cite

Motivation Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs). Results DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred’s predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods. Availability DisoLipPred’s webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/ Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bi Zhao

DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning

DescribePROT: database of amino acid-level protein structure and function predictions

How many differentially expressed genes: A perspective from the comparison of genotypic and phenotypic distances

Deep learning in prediction of intrinsic disorder in proteins

Prognostic value of the expression of phosphatase and tensin homolog and CD44 in elderly patients with refractory acute myeloid leukemia

Computational Disorder Analysis in Ethylene Response Factors Uncovers Binding Motifs Critical to Their Diverse Functions

Surveying over 100 predictors of intrinsic disorder in proteins

DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning

Contact Info

Product

Resources

About