Classification of multi-family enzymes by multi-label machine learning and sequence-based descriptors

Wang, Yuelong; Jing, Runyu; Hua, Yongpan; Fu, Yuanyuan; Dai, Xu; Huang, Liqiu; Li, Menglong

doi:10.1039/c4ay01240b

Cited by 9 publications

(8 citation statements)

References 57 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results showed that the ability to predict the enzyme in the subclass of oxidoreductases is somewhat poor, with a success rate by the jackknife test of 86.7%. In 2014, Wang et al [34] adopted several different methods in feature extraction and classification of feature extraction methods to match one classification method and compared the prediction results of four prediction models. They found that the best prediction model is the combination of RAkEL-RF and CTD, with which the highest accuracy with 10-fold cross validation of the training data reached 97.99% and the test data reached 97.57%.…”

Section: B Comment On Published Resultsmentioning

confidence: 99%

“…For instance, Shen et al [33] combined functional domain (FunD) and pseudo position-specific scoring matrix (PsePSSM) to extract features in 2009. Wang et al [34] combined composition, transition and distribution (CTD) and pseudo-amino acid composition (PseAAC) to extract features and classify sequences with the combination of the methods of random-k-label-random forest (RAkEL-RF) and multi-label KNN (MLKNN) in 2014. In 2019, Ryu et al [35] used DeepEC, consisting of three different convolutional neural network (CNN) structures in enzyme classification.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Classification of Enzymes by Deep Learning

et al. 2020

View full text Add to dashboard Cite

Enzymes, as a group of crucial biocatalysts produced by living cells, enable the chemical reactions in organisms to be more efficient. According to the properties of the reactions catalyzed by enzymes, the Enzyme Commission (EC) number system divided enzymes into 6 primary main classes in 1961: oxidoreductases (EC1), transferases (EC2), hydrolases (EC3), lyases (EC4), isomerases (EC5), and ligases (EC6). These six categories did not change for many years until a new class, the translocases (EC7), was added in August 2018. Different enzymes have different properties of catalytic reaction, and the prediction of enzyme classes is a very important research topic, allowing us to further study the structure and function of enzyme molecules when we know the category of enzyme. Because the number of enzymes whose function remains unknown is enormous, it is time-consuming to use biological experiments to determine enzyme characteristics. Thus, devising various computational models to predict enzyme classes has become a feasible scheme. In hope of giving researchers more inspiration and ideas for predicting the EC number of enzymes by machine learning, we summarize a variety of research methods used in the prediction of enzyme families in this research.INDEX TERMS Commission, enzyme classification, machine learning, bioinformatics.

show abstract

Section: B Comment On Published Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The Classification of Enzymes by Deep Learning

et al. 2020

View full text Add to dashboard Cite

show abstract

“…These enzymes have been well identified and characterized in plants, bacteria and fungi, and are engaged as an industrially important biocatalyst for the production of bulk and fine chemicals. For example, mandelonitrile could be hydrolyzed to optically pure (R)-(-)-mandelic acid, which is widely used for the production of semisynthetic cephalosporins, penicillins, antitumor agents, and anti-obesity agents (Wang et al 2014). Researchers have revealed that nitrilases play a vital role in various biological processes and plant-microbe interaction, but despite their valuable importance they are relatively less explored for their metabolic functions.…”

Section: Introductionmentioning

confidence: 99%

Classifying nitrilases as aliphatic and aromatic using machine learning technique

et al. 2018

View full text Add to dashboard Cite

ProCos (Protein Composition Server, script version), one of the machine learning techniques, was used to classify nitrilases as aliphatic and aromatic nitrilases. Some important feature vectors were used to train the algorithm, which included pseudo-amino acid composition (PAAC) and five-factor solution score (5FSS). This clearly differentiated into two groups of nitrilases, i.e., aliphatic and aromatic, achieving maximum sensitivity of 100.00%, specificity of 90.00%, accuracy of 95.00% and Mathew Correlation Coefficient (MCC) of about 0.90 for the pseudo-amino acid composition. On the other hand, five-factor solution score achieved a sensitivity of 96.00%, specificity of 84.00%, accuracy of 90.00% and Mathew Correlation Coefficient (MCC) of about 0.81. The total count of aliphatic amino acids, Ala (A), Gly (G), Leu (L), Ile (I), Val (V), Met (M) and Pro (P), was found to be higher, i.e., 42.7 in case of aliphatic nitrilases, whereas it was 40.1 in aromatic nitrilases. On the other hand, aromatic amino acids, Tyr (Y), Trp (W), His (H) and Phe (F) number, were found to be higher, i.e., 12.7 in aromatic nitrilases as compared to aliphatic nitrilases which was 10.7. This approach will help in predicting a nitrilase as aromatic or aliphatic nitrilase based on its amino acid sequence. Access to the scripts can be done logging onto GitHub using keyword 'Nitrilase' or 'https://github.com/rover2380/Nitrilase.git'.

show abstract

“…Combination of sequence, structure, and chemical properties of enzymes was also explored by Borgwardt et al (2005) using kernel methods and SVM on the BRENDA database and achieved an accuracy of 93% with six-fold cross-validation on information extracted through protein graph models. Multi-label classification using different methods such as RAkEL-RF and MLKNN (Wang et al, 2014) or MULAN (Zou et al, 2013) was performed on single- and multi-labeled enzymes. In particular, the latter was assessed on enzymes from the Swiss-Prot database based on their amino acid composition and their physico-chemical properties and involved the use of position-specific scoring matrices.…”

Section: Introductionmentioning

confidence: 99%

Automatic single- and multi-label enzymatic function prediction by machine learning

Amidi

Vlachakis

et al. 2017

PeerJ

View full text Add to dashboard Cite

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at .

show abstract

Classification of multi-family enzymes by multi-label machine learning and sequence-based descriptors

Cited by 9 publications

References 57 publications

The Classification of Enzymes by Deep Learning

The Classification of Enzymes by Deep Learning

Classifying nitrilases as aliphatic and aromatic using machine learning technique

Automatic single- and multi-label enzymatic function prediction by machine learning

Contact Info

Product

Resources

About