Metabolic pathways refer to the continuous chemical reactions in the metabolic process in vivo. Compounds are the major participant for most metabolic pathways. It is essential to determine which compounds can constitute a metabolic pathway. This problem can be converted to the identification of the metabolic pathways of compounds. Although traditional experiments can provide solid results, they are always of low efficiency and high cost. To date, several machine leaning models have been proposed to address this problem. However, almost all models only identified metabolic pathway types of compounds rather than actual metabolic pathways. This study proposed a novel model for predicting actual metabolic pathways for given compounds. The pairs of compounds and metabolic pathways were termed as samples, thereby modeling a binary classification problem. With the concept of "similarity", each sample was represented by seven features, extracted from seven associations of compounds, which measure compound linkages from different aspects. The model adopted random forest as the classification algorithm. Two types of tenfold cross-validation were adopted to evaluate the performance of the model, indicating its utility. A feature analysis was also performed to determine which compound association was highly related to the identification of metabolic pathways of compounds.
Background: Metabolic pathway is one of the most basic biological pathways in living organisms. It consists of a series of chemical reactions and provides the necessary molecules and energies for organisms. To date, lots of metabolic pathways have been detected. However, there still exist hidden participants (compounds and enzymes) for some metabolic pathways due to the complexity and diversity of metabolic pathways. It is necessary to develop quick, reliable, and non-animal-involved prediction model to recognize metabolic pathways for any compound. Methods: In this study, a multi-label classifier, namely iMPT-FRAKEL, was developed for identifying which metabolic pathway types that compounds can participate in. Compounds and 12 metabolic pathway types were retrieved from KEGG. Each compound was represented by its fingerprints, which was the most widely used form for representing compounds and can be extracted from its SMILES format. A popular multi-label classification scheme, Random k-Labelsets (RAKEL) algorithm, was adopted to build the classifier. Classic machine learning algorithm, Support Vector Machine (SVM) with RBF kernel, was selected as the basic classification algorithm. Ten-fold cross-validation was used to evaluate the performance of the iMPT-FRAKEL. In addition, a web-server version of such classifier was set up, which can be assessed at http://cie.shmtu.edu.cn/impt/index. Results: iMPT-FRAKEL yielded the accuracy of 0.804, exact match of 0.745 and hamming loss of 0.039. Comparison results indicated that such classifier was superior to other models, including models with Binary Relevance (BR) or other classification algorithms. Conclusion: The proposed classifier employed limited prior knowledge of compounds but gives satisfying performance for recognizing metabolic pathways of compounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.