BackgroundComputational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.ResultsThis paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors.ConclusionsWe proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
An aspect of mycotoxin biosynthesis that remains unclear is its relationship with the cellular management of reactive oxygen species (ROS). Here we conduct a comparative study of the total ROS production in the wild-type strain (SU-1) of the plant pathogen and aflatoxin producer, Aspergillus parasiticus, and its mutant strain, AFS10, in which the aflatoxin biosynthesis pathway is blocked by disruption of its pathway regulator, aflR. We show that SU-1 demonstrates a significantly faster decrease in total ROS than AFS10 between 24 h to 48 h, a time window within which aflatoxin synthesis is activated and reaches peak levels in SU-1. The impact of aflatoxin synthesis in alleviation of ROS correlated well with the transcriptional activation of five superoxide dismutases (SOD), a group of enzymes that protect cells from elevated levels of a class of ROS, the superoxide radicals (O2−). Finally, we show that aflatoxin supplementation to AFS10 growth medium results in a significant reduction of total ROS only in 24 h cultures, without resulting in significant changes in SOD gene expression. Our findings show that the activation of aflatoxin biosynthesis in A. parasiticus alleviates ROS generation, which in turn, can be both aflR dependent and aflatoxin dependent.
Molecular mimicry between viral antigens and host proteins can produce cross-reacting antibodies leading to autoimmunity. The coronavirus SARS-CoV-2 causes COVID-19, a disease curiously resulting in varied symptoms and outcomes, ranging from asymptomatic to fatal. Autoimmunity due to cross-reacting antibodies resulting from molecular mimicry between viral antigens and host proteins may provide an explanation. Thus, we computationally investigated molecular mimicry between SARS-CoV-2 Spike and known epitopes. We discovered molecular mimicry hotspots in Spike and highlight two examples with tentative high autoimmune potential and implications for understanding COVID-19 complications. We show that a TQLPP motif in Spike and thrombopoietin shares similar antibody binding properties. Antibodies cross-reacting with thrombopoietin may induce thrombocytopenia, a condition observed in COVID-19 patients. Another motif, ELDKY, is shared in multiple human proteins, such as PRKG1 involved in platelet activation and calcium regulation, and tropomyosin, which is linked to cardiac disease. Antibodies cross-reacting with PRKG1 and tropomyosin may cause known COVID-19 complications such as blood-clotting disorders and cardiac disease, respectively. Our findings illuminate COVID-19 pathogenesis and highlight the importance of considering autoimmune potential when developing therapeutic interventions to reduce adverse reactions.
Thrombocytopenia, characterized by reduced platelet count, increases mortality in COVID-19 patients. We performed a computational investigation of antibody-induced cross-reactivity due to molecular mimicry between SARS-CoV-2 Spike protein and human thrombopoietin, the regulator of platelet production, as a mechanism for thrombocytopenia in COVID-19 infections. The presence of a common sequence motif with similar structure and antibody-binding properties for these proteins strongly indicate shared molecular mimicry. Recent reports of antibodies in COVID-19 patients and pre-pandemic samples against epitopes containing the motif offer additional support for the cross-reactivity. Altogether, this suggests cross-reactivity between an antibody with affinity for Spike protein and a human protein. Consideration of cross-reactivity for SARS-CoV-2 is important for therapeutic intervention and when designing the next generation of COVID-19 vaccines to avoid potential autoimmune interference.
Many proteins are sorted to multiple subcellular localizations within the cell. However, computational prediction of multi-location proteins remains a challenging task. Here we applied a logistic regression and diffusion kernel based algorithm NetLoc for predicting multiplex proteins and explored its capability and limitations. Experiment shows that the overall and true success rates for physical protein-protein interaction network are 65% and 41% respectively, and for mixed PPI network these values are 88% and 75% respectively. Our study also showed that the performance of NetLoc in predicting protein localization is limited by the network characteristics such as ratio of the number of co-localized protein-protein interactions (coPPI) to the number of non-co-localized PPI (ncPPI) and the density of annotated coPPI in the network. For a given network with a specific number of proteins, NetLoc performance increases with higher coPPI/ncPPI ratio and higher density of annotated coPPI.
We present NetLoc, a novel diffusion Kernel-based Logistic Regression (KLR) algorithm for predicting protein subcellular localisation using four types of protein networks including physical PPI networks, genetic Protein-Protein Interaction (PPI) networks, mixed PPI networks and co-expression networks. NetLoc is applied to yeast protein localisation prediction. The results showed that protein networks can provide rich information for protein localisation prediction, achieving Area Under Curve (AUC) score of 0.93. We also showed that networks with high connectivity and high percentage of co-localised PPI lead to better prediction performance. Investigation showed that NetLoc is a very robust approach which can produce good performance (AUC = 0.75) only using 30% of original interactions and capable of producing overall accuracy greater than 0.5 only with 20% annotation coverage. Compared to the previous network feature based prediction algorithm which achieved AUC scores of 0.49 and 0.52 on the yeast PPI network, NetLoc achieved significantly better overall performance with the AUC of 0.74.
Recent studies indicate that lncRNA plays key roles in tumorigenesis and misexpression of lncRNAs can lead to change in expression profiles of various target genes involved in different aspects of cancer progression. However, research on classifying multiple cancer types using only lncRNA is rarely found. In this paper, we explored the capability of lncRNA in classifying cancer types by employing four deep neural networks-multi-layer perceptron (MLP), longshort-term memory (LSTM), convolutional neural network (CNN) and deep autoencoder (DAE). For experiment, RNA-seq expression values from TCGA for 8 cancers-BLCA, CESC, COAD, HNSC, KIRP, LGG, LIHC, and LUAD-are used. The combined dataset consists of 3656 patients with expression values for 12309 lncRNAs. The performance of the models in terms of accuracy ranges from 94% to 98%, which shows lncRNA expression profiles as the better signature compared to the mRNA expression profiles in classifying cancer types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.