Coronavirus disease 2019 pandemic spreads rapidly and requires an acceleration in the process of drug discovery. Drug repurposing can help accelerate the drug discovery process by identifying new efficacy for approved drugs, and it is considered an efficient and economical approach. Research in drug repurposing can be done by observing the interactions of drug compounds with protein related to a disease (DTI), then predicting the new drug-target interactions. This study conducted multilabel DTI prediction using the stack autoencoder-deep neural network (SAE-DNN) algorithm. Compound features were extracted using PubChem fingerprint, daylight fingerprint, MACCS fingerprint, and circular fingerprint. The results showed that the SAE-DNN model was able to predict DTI in COVID-19 cases with good performance. The SAE-DNN model with a circular fingerprint dataset produced the best average metrics with an accuracy of 0.831, recall of 0.918, precision of 0.888, and F-measure of 0.89. Herbal compounds prediction results using the SAE-DNN model with the circular, daylight, and PubChem fingerprint dataset resulted in 92, 65, and 79 herbal compounds contained in herbal plants in Indonesia respectively.
Jamu is an Indonesian traditional herbal medicine that has been practiced for generations. Jamu is made from various medicinal plants. Each plant has several compounds directly related to the target protein that are directly associated with a disease. A pharmacological graph can form relationships between plants, compounds, and target proteins. Research related to the prediction of Jamu formulas for some diseases has been carried out, but there are problems in finding combinations or compositions of Jamu formulas because of the increase in search space size. Some studies adopted the drug–target interaction (DTI) implemented using machine learning or deep learning to predict the DTI for discovering the Jamu formula. However, this approach raises important issues, such as imbalanced and high-dimensional dataset, overfitting, and the need for more procedures to trace compounds to their plants. This study proposes an alternative approach by implementing bipartite graph search optimization using the branch and bound algorithm to discover the combination or composition of Jamu formulas by optimizing the search on a plant–protein bipartite graph. The branch and bound technique is implemented using the search strategy of breadth first search (BrFS), Depth First Search, and Best First Search. To show the performance of the proposed method, we compared our method with a complete search algorithm, searching all nodes in the tree without pruning. In this study, we specialize in applying the proposed method to search for the Jamu formula for type II diabetes mellitus (T2DM). The result shows that the bipartite graph search with the branch and bound algorithm reduces computation time up to 40 times faster than the complete search strategy to search for a composition of plants. The binary branching strategy is the best choice, whereas the BrFS strategy is the best option in this research. In addition, the the proposed method can suggest the composition of one to four plants for the T2DM Jamu formula. For a combination of four plants, we obtain Angelica Sinensis, Citrus aurantium, Glycyrrhiza uralensis, and Mangifera indica. This approach is expected to be an alternative way to discover the Jamu formula more accurately.
Coronavirus disease 2019 is an infectious disease that causes severe respiratory, digestive, and systemic infections that caused a pandemic in 2019. One of the focuses of the drug development process to fight the coronavirus disease 2019 is by carrying out drug repurposing. This study uses random forest with a feature-based chemogenomics approach on the drug-target interaction data of coronavirus disease 2019. The feature extraction process is carried out on compounds and protein using PubChem fingerprint and amino acid composition respectively. Feature selection using XGBoost is done to reduce the data dimension. The random undersampling process was also carried out to solve the problem of imbalanced data in the dataset. Using the cross-validation process, the random forest model produced an average accuracy value of 0.98, recall value of 0.92, precision value of 0.95, AUROC value of 0.95, and F1 score of 0.93. The random forest model also produced an accuracy value of 0.99, recall value of 0.93, the precision value of 0.94, AUROC value of 0.99, and F-measure of 0.94 when used to predict the original dataset (dataset without random undersampling process).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.