2019
DOI: 10.1142/s021972001950029x
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods

Abstract: Oxidoreductase is an enzyme that widely exists in organisms. It plays an important role in cellular energy metabolism and biotransformation processes. Oxidoreductases have many subclasses with different functions, creating an important classification task in bioinformatics. In this paper, a dataset of 2640 oxidoreductase sequences was used to perform an analysis and comparison. The idea of dipeptides was introduced to process the Position Specific Score Matrix (PSSM), since each dipeptide consists of two amino… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 34 publications
0
2
0
Order By: Relevance
“…where S i (1 ≤ i ≤ L) represents the amino acid residue appearing in the ith position of S, and L is the length of S [44], [45]. The so-called evolutionary profile of S is the PSSM, which has been applied to the prediction of protein sequences in numerous previous studies, achieving good results [28], [29], [46]- [49]. In this study, we performed three iterative searches of the Uniref50 database using PSI-BLAST, setting the e-value to 0.001 to generate the BLO-SUM62 replacement matrix.…”
Section: ) Position-specific Scoring Matrix Compositionmentioning
confidence: 99%
“…where S i (1 ≤ i ≤ L) represents the amino acid residue appearing in the ith position of S, and L is the length of S [44], [45]. The so-called evolutionary profile of S is the PSSM, which has been applied to the prediction of protein sequences in numerous previous studies, achieving good results [28], [29], [46]- [49]. In this study, we performed three iterative searches of the Uniref50 database using PSI-BLAST, setting the e-value to 0.001 to generate the BLO-SUM62 replacement matrix.…”
Section: ) Position-specific Scoring Matrix Compositionmentioning
confidence: 99%
“…With the rapid growth of sequences genomic data, dealing with large amounts of biological sequences data requires fast and accurate automated methods for identification and annotation. Therefore, many research groups are dedicated to the study of biological sequences extraction algorithms, feature selection, and classification algorithms using machine learning and deep learning methods, such as amino acid composition (AAC), pseudo amino acid composition (PseAAC), protein position-specific scoring matrix (PSSM), dipeptide composition (DipC), tripeptide composition (TPC), 20-D condensed feature vectors (CFV), general dipeptide composition (GDipC), Lasso feature selection, neural network (NN), support vector machine (SVM), k-nearest neighbor (KNN), random forest (RF) and decision tree (DT), etc., and successfully applied them to protein structure and functional spectrum classification and prediction [ 6 23 ]. Recently Stochastic Gradient Descent (SGD) has been successfully applied to the field of sparse and large-scale machine learning [ 24 , 25 ].…”
Section: Introductionmentioning
confidence: 99%