BackgroundIdentifying protein-protein interactions (PPIs) is essential for elucidating protein functions and understanding the molecular mechanisms inside the cell. However, the experimental methods for detecting PPIs are both time-consuming and expensive. Therefore, computational prediction of protein interactions are becoming increasingly popular, which can provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale, and can be used to complement experimental approaches. Although much progress has already been achieved in this direction, the problem is still far from being solved and new approaches are still required to overcome the limitations of the current prediction models.ResultsIn this work, a sequence-based approach is developed by combining a novel Multi-scale Continuous and Discontinuous (MCD) feature representation and Support Vector Machine (SVM). The MCD representation gives adequate consideration to the interactions between sequentially distant but spatially close amino acid residues, thus it can sufficiently capture multiple overlapping continuous and discontinuous binding patterns within a protein sequence. An effective feature selection method mRMR was employed to construct an optimized and more discriminative feature set by excluding redundant features. Finally, a prediction model is trained and tested based on SVM algorithm to predict the interaction probability of protein pairs.ConclusionsWhen performed on the yeast PPIs data set, the proposed approach achieved 91.36% prediction accuracy with 91.94% precision at the sensitivity of 90.67%. Extensive experiments are conducted to compare our method with the existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors, whose average prediction accuracy is 84.91%, sensitivity is 83.24%, and precision is 86.12%. Achieved results show that the proposed approach is very promising for predicting PPI, so it can be a useful supplementary tool for future proteomics studies. The source code and the datasets are freely available at http://csse.szu.edu.cn/staff/youzh/MCDPPI.zip for academic use.
BackgroundBladder cancer is the most common malignant tumor of the urinary system and it is a heterogeneous disease with both superficial and invasive growth. However, its aetiological agent is still unclear. And it is indispensable to find key genes or modules causing the bladder cancer. Based on gene expression microarray datasets, constructing differential co-expression networks (DCNs) is an important method to investigate diseases and there have been some relevant good tools such as R package 'WGCNA', 'DCGL'.ResultsEmploying an integrated strategy, 36 up-regulated differentially expressed genes (DEGs) and 356 down-regulated DEGs were selected and main functions of those DEGs are cellular physiological precess(24 up-regulated DEGs; 167 down-regulated DEGs) and cellular metabolism (19 up-regulated DEGs; 104 down-regulated DEGs). The up-regulated DEGs are mainly involved in the the pathways related to "metabolism". By comparing two DCNs between the normal and cancer states, we found some great changes in hub genes and topological structure, which suggest that the modules of two different DCNs change a lot. Especially, we screened some hub genes of a differential subnetwork between the normal and the cancer states and then do bioinformatics analysis for them.ConclusionsThrough constructing and analyzing two differential co-expression networks at different states using the screened DEGs, we found some hub genes associated with the bladder cancer. The results of the bioinformatics analysis for those hub genes will support the biological experiments and the further treatment of the bladder cancer.
Cervical cancer is the third most common malignancy in women worldwide. It remains a leading cause of cancer-related death for women in developing countries. In order to contribute to the treatment of the cervical cancer, in our work, we try to find a few key genes resulting in the cervical cancer. Employing functions of several bioinformatics tools, we selected 143 differentially expressed genes (DEGs) associated with the cervical cancer. The results of bioinformatics analysis show that these DEGs play important roles in the development of cervical cancer. Through comparing two differential co-expression networks (DCNs) at two different states, we found a common sub-network and two differential sub-networks as well as some hub genes in three sub-networks. Moreover, some of the hub genes have been reported to be related to the cervical cancer. Those hub genes were analyzed from Gene Ontology function enrichment, pathway enrichment and protein binding three aspects. The results can help us understand the development of the cervical cancer and guide further experiments about the cervical cancer.
In order to transform protein sequences into the feature vectors, several works have been done, such as computing auto covariance (AC), conjoint triad (CT), local descriptor (LD), moran autocorrelation (MA), normalized moreaubroto autocorrelation (NMB) and so on. In this paper, we shall adopt these transformation methods to encode the proteins, respectively, where AC, CT, LD, MA and NMB are all represented by '+' in a unified manner. A new method, i.e. the combination of least squares regression with '+' (abbreviated as LSR(+)), will be introduced for encoding a protein-protein correlation-based feature representation and an interacting protein pair. Thus there are totally five different combinations for LSR(+), i.e. LSRAC, LSRCT, LSRLD, LSRMA and LSRNMB. As a result, we combined a support vector machine (SVM) approach with LSR(+) to predict protein-protein interactions (PPI) and PPI networks. The proposed method has been applied on four datasets, i.e. Saaccharomyces cerevisiae, Escherichia coli, Homo sapiens and Caenorhabditis elegans. The experimental results demonstrate that all LSR(+) methods outperform many existing representative algorithms. Therefore, LSR(+) is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction of PPI networks.
In this study, in order to make use of complementary information from different types of data for better disease status diagnosis, we combined gene expression with DNA methylation data and generated a fused network, based on which the stages of KIRC (Kidney Renal Cell Carcinoma) can be better identified. It is well recognized that a network is important for investigating the connectivity of disease groups. We exploited the potential of the network’s features to identify the KIRC stage. We first constructed a patient network from each type of data. We then built a fused network based on network fusion method. Based on the link weights of patients, we used a generalized linear model to predict the group of KIRC subjects. Finally, the group prediction method was applied to test the power of network-based features. The performance (e.g., the accuracy of identifying cancer stages) when using the fused network from two types of data is shown to be superior to using two patient networks from only one data type. The work provides a good example for using network based features from multiple data types for a more comprehensive diagnosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.