Accurate identification of drug-targets in human body has great significance for designing novel drugs. Compared with traditional experimental methods, prediction of drug-targets via machine learning algorithms has enhanced the attention of many researchers due to fast and accurate prediction. In this study, we propose a machine learning-based method, namely XGB-DrugPred for accurate prediction of druggable proteins. The features from primary protein sequences are extracted by group dipeptide composition, reduced amino acid alphabet, and novel encoder pseudo amino acid composition segmentation. To select the best feature set, eXtreme Gradient Boosting-recursive feature elimination is implemented. The best feature set is provided to eXtreme Gradient Boosting (XGB), Random Forest, and Extremely Randomized Tree classifiers for model training and prediction. The performance of these classifiers is evaluated by tenfold cross-validation. The empirical results show that XGB-based predictor achieves the best results compared with other classifiers and existing methods in the literature.
The study of disease-pathway association in human diseases is a perennial focus of the biomedical field. The association of diseases and pathways can help in the discovery of the mechanisms or relationships of human diseases. The accuracy of disease identification has been less than satisfactory despite decades of research in this area. Therefore, this study proposes a computational model for the prediction of disease-pathway associations. The proposed computational model is based on Random Walk with Restart on heterogeneous network (RWRH) and PageRank. The RWRH disease-pathway association model is a novel computational model that can predict potential disease-pathway associations. Furthermore, the model can help pathologists understand the correlations among disease-pathway associations, treatments, and reactions. We performed a pathway-based study to expand disease variation relationships and to find new molecular correlations between genetic mutations. We constructed a biological network on the basis of shared gene interactions of disease-pathways and attempted to investigate the pathogenesis of a disease by analyzing the constructed network. The network construction was based on two parts. First, the similarity between pathway-pathway networks was calculated. Second, a disease-disease (DD) similarity network was constructed, and the correlation between disease and disease similarity was calculated. We also investigated the pathway seed node and disease seed node with high PageRank. Moreover, we focused on mining the complexity of disease-pathway associations. We used the bipartite network of disease-pathway associations to combine the obtained biological information, which was based on the pair similarity of sequence expression weights. These weights, which were obtained by using the multilayer resource-allocation algorithm, were used to calculate the prediction scores of each disease-pathway pair. Here, through leave-one-out crossvalidation, we examined a 210 × 1855 matrix, with the 210 rows representing diseases and 1855 columns indicating pathways. The disease-pathway adjacency matrix contained 13,838 known disease-pathway associations. The best predictive results achieved an area-under-the-curve value of 0.8218 and a twoclass precision-recall curve. These results indicate that our method has higher scientific performance than previously proposed methods. We predicted pathogen, DD, and disease-pathway relationships by comparing them with known associations and through publication search. We then proposed the possible reasons for our predictions.
Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.
:
This study focused on describing the necessary information related to pathway
mechanisms, pathway characteristics, and pathway databases feature annotations. Various difficulties
related to data storage and data retrieval in biological pathway databases are discussed. These focus
on different techniques for retrieving annotations, features, and methods of digital pathway databases
for biological pathway analysis. Furthermore, many pathway databases annotations, features, and
search databases were also examined (which are reasonable for the integration into microarray
examination). The investigation was performed on the pathway databases, which contain human
pathways to understand the hidden components of cells applied in this process. Three different
domain-specific pathways were selected for this study, and the information of pathway databases was
extracted from the existing literature. The research compared different pathways and performed
molecular level relations. Moreover, the associations between pathway networks were also evaluated.
The study involved datasets for gene pathway matrices and pathway scoring techniques. Additionally,
different pathways techniques such as metabolomics and biochemical pathways, translation, control,
and signaling pathways and signal transduction were also considered. We also analyzed the list of
gene sets and constructed a gene pathway network. This article will serve as a useful manual for
storing a repository of specific biological data and disease pathways.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.