2022
DOI: 10.2174/1574893616666210820095144
|View full text |Cite
|
Sign up to set email alerts
|

iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins

Abstract: Background: Reactive oxygen species (ROS) has many roles in the body such as cell signaling, homeostasis or protection from harmful bacteria. However, too much ROS in the body will damage lipids, proteins, and DNA. Many studies show that many environmental factors increase the amount of ROS produced in the body. Antioxidant proteins are responsible for neutralizing these ROS or free radicals. Although the amount of data on protein sequences has increased over the last two decades, we still lack bioinformatics … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(10 citation statements)
references
References 20 publications
0
7
0
Order By: Relevance
“…Compared with the GBDT algorithm, XGBoost maximizes speed and efficiency. Random forest (RF) is an effective machine learning algorithm ( Ao et al, 2022b ; Tran and Nguyen, 2022 ; Naik et al, 2023 ) which is a random composition of many unrelated decision trees. When judging the category of a new sample, each RF decision tree makes an independent judgment and finally selects the category with the highest probability value.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared with the GBDT algorithm, XGBoost maximizes speed and efficiency. Random forest (RF) is an effective machine learning algorithm ( Ao et al, 2022b ; Tran and Nguyen, 2022 ; Naik et al, 2023 ) which is a random composition of many unrelated decision trees. When judging the category of a new sample, each RF decision tree makes an independent judgment and finally selects the category with the highest probability value.…”
Section: Resultsmentioning
confidence: 99%
“…Compared with the GBDT algorithm, XGBoost maximizes speed and efficiency. Random forest (RF) is an effective machine learning algorithm (Ao et al, 2022b;Tran and Nguyen, 2022;Naik et al, 2023) which is a random composition of many Finally, a strong classifier will be obtained when the minimum error rate or the maximum number of iterations is reached. The decision tree classification algorithm constructs a tree-type classification model from the training samples (Shabbir et al, 2021).…”
Section: Performance Of Different Classifiersmentioning
confidence: 99%
“…We need to convert sequences into vectors in mathematical representation (Amanatidou, and Dedoussis, 2021;Dao et al, 2022a;Jeon et al, 2022;Li H et al, 2022;Nidhi et al, 2022;Sun et al, 2022;Tran and Nguyen, 2022;Wang et al, 2022;Yang et al, 2022;. The amino acid composition (ACC) of the protein has a great impact on its subcellular location (Chou and Elrod, 1999a;Awais et al, 2021;Chou and Elrod, 1999b;Rout et al, 2022;Naseer et al, 2021;Manavalan and Patra, 2022;Shoombuatong et al, 2022).…”
Section: Feature Encodingmentioning
confidence: 99%
“…We have compiled relevant research conducted by researchers in recent years and compared 11 models. In the process of feature extraction, there are many aspects of feature extraction, such as amino acid composition, protein secondary structure information, and physical and chemical properties of protein sequences, which play an important role in the identification of antioxidant proteins 5–7 . In the process of feature selection, we should not only select feature combinations with high contribution but also consider that the dimension of features should not be too high 8–10 .…”
Section: Introductionmentioning
confidence: 99%
“…In the process of feature extraction, there are many aspects of feature extraction, such as amino acid composition, protein secondary structure information, and physical and chemical properties of protein sequences, which play an important role in the identification of antioxidant proteins. [5][6][7] In the process of feature selection, we should not only select feature combinations with high contribution but also consider that the dimension of features should not be too high. [8][9][10] A dimension that is too high will affect not only the efficiency of the model but also the accuracy of the model due to redundant features.…”
Section: Introductionmentioning
confidence: 99%