BackgroundEssential proteins play an indispensable role in the cellular survival and development. There have been a series of biological experimental methods for finding essential proteins; however they are time-consuming, expensive and inefficient. In order to overcome the shortcomings of biological experimental methods, many computational methods have been proposed to predict essential proteins. The computational methods can be roughly divided into two categories, the topology-based methods and the sequence-based ones. The former use the topological features of protein-protein interaction (PPI) networks while the latter use the sequence features of proteins to predict essential proteins. Nevertheless, it is still challenging to improve the prediction accuracy of the computational methods.ResultsComparing with nonessential proteins, essential proteins appear more frequently in certain subcellular locations and their evolution more conservative. By integrating the information of subcellular localization, orthologous proteins and PPI networks, we propose a novel essential protein prediction method, named SON, in this study. The experimental results on S.cerevisiae data show that the prediction accuracy of SON clearly exceeds that of nine competing methods: DC, BC, IC, CC, SC, EC, NC, PeC and ION.ConclusionsWe demonstrate that, by integrating the information of subcellular localization, orthologous proteins with PPI networks, the accuracy of predicting essential proteins can be improved. Our proposed method SON is effective for predicting essential proteins.
It is generally acknowledged that driver pathway plays a decisive role in the occurrence and progress of tumors, and the identification of driver pathways has become imperative for precision medicine or personalized medicine. Due to the inevitable sequencing error, the noise contained in single omics cancer data usually plays a negative effect on identification. It is a feasible approach to take advantage of multi-omics cancer data rather than a single one now that large amounts of multi-omics cancer data have become available. The identification of driver pathways by integrating multi-omics cancer data has attracted attention of researchers in bioinformatics recently. In this paper, a weighted non-binary mutation matrix is constructed by integrating copy number variations, somatic mutations and gene expressions. Based on the weighted non-binary mutation matrix, a new identification model is proposed through defining new measurements of coverage and exclusivity. Then, a cooperative coevolutionary algorithm CGA-MWS is put forward for solving the presented model. Both real cancer data and simulated one were used to conduct comparisons among methods Dendrix, GA, iMCMC, MOGA, PGA-MWS and CGA-MWS. Compared with the pathways identified by the other five methods, more genes, belonging to the pathway identified by the CGA-MWS method, are enriched in a known signaling pathway in most cases. Simultaneously, the high efficiency of method CGA-MWS makes it practical in realistic applications. All of which have been verified through a number of experiments.
With the rapid development of deep sequencing technologies, a large amount of high-throughput data has been available for studying the carcinogenic mechanism at the molecular level. It has been widely accepted that the development and progression of cancer are regulated by modules/pathways rather than individual genes. The investigation of identifying cancer-related active modules has received an extensive attention. In this paper, we put forward an identification method ModFinder by integrating both biological networks and gene expression profiles. More concretely, a gene scoring function is devised by using the regression model with [Formula: see text]-step random walk kernel, and the genes are ranked according to both of their active scores and degrees in the PPI network. Then a greedy algorithm NSEA is introduced to find an active module with high score and strong connectivity. Experiments were performed on both simulated data and real biological one, i.e. breast cancer and cervical cancer. Compared with the previous methods SigMod, LEAN and RegMod, ModFinder shows competitive performance. It can successfully identify a well-connected module that contains a large proportion of cancer-related genes, including some well-known oncogenes or tumor suppressors enriched in cancer-related pathways.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.