iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins

Tran, Hoang Vinh; Nguyen, Quang H.

doi:10.2174/1574893616666210820095144

Cited by 21 publications

(10 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared with the GBDT algorithm, XGBoost maximizes speed and efficiency. Random forest (RF) is an effective machine learning algorithm ( Ao et al, 2022b ; Tran and Nguyen, 2022 ; Naik et al, 2023 ) which is a random composition of many unrelated decision trees. When judging the category of a new sample, each RF decision tree makes an independent judgment and finally selects the category with the highest probability value.…”

Section: Resultsmentioning

confidence: 99%

“…Compared with the GBDT algorithm, XGBoost maximizes speed and efficiency. Random forest (RF) is an effective machine learning algorithm (Ao et al, 2022b;Tran and Nguyen, 2022;Naik et al, 2023) which is a random composition of many Finally, a strong classifier will be obtained when the minimum error rate or the maximum number of iterations is reached. The decision tree classification algorithm constructs a tree-type classification model from the training samples (Shabbir et al, 2021).…”

Section: Performance Of Different Classifiersmentioning

confidence: 99%

See 1 more Smart Citation

ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree

Chen

et al. 2023

Front. Genet.

View full text Add to dashboard Cite

Cancer is one of the most dangerous diseases in the world, killing millions of people every year. Drugs composed of anticancer peptides have been used to treat cancer with low side effects in recent years. Therefore, identifying anticancer peptides has become a focus of research. In this study, an improved anticancer peptide predictor named ACP-GBDT, based on gradient boosting decision tree (GBDT) and sequence information, is proposed. To encode the peptide sequences included in the anticancer peptide dataset, ACP-GBDT uses a merged-feature composed of AAIndex and SVMProt-188D. A GBDT is adopted to train the prediction model in ACP-GBDT. Independent testing and ten-fold cross-validation show that ACP-GBDT can effectively distinguish anticancer peptides from non-anticancer ones. The comparison results of the benchmark dataset show that ACP-GBDT is simpler and more effective than other existing anticancer peptide prediction methods.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Compared with the GBDT algorithm, XGBoost maximizes speed and efficiency. Random forest (RF) is an effective machine learning algorithm (Ao et al, 2022b;Tran and Nguyen, 2022;Naik et al, 2023) which is a random composition of many Finally, a strong classifier will be obtained when the minimum error rate or the maximum number of iterations is reached. The decision tree classification algorithm constructs a tree-type classification model from the training samples (Shabbir et al, 2021).…”

Section: Performance Of Different Classifiersmentioning

confidence: 99%

ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree

Chen

et al. 2023

Front. Genet.

View full text Add to dashboard Cite

show abstract

“…We need to convert sequences into vectors in mathematical representation (Amanatidou, and Dedoussis, 2021;Dao et al, 2022a;Jeon et al, 2022;Li H et al, 2022;Nidhi et al, 2022;Sun et al, 2022;Tran and Nguyen, 2022;Wang et al, 2022;Yang et al, 2022;. The amino acid composition (ACC) of the protein has a great impact on its subcellular location (Chou and Elrod, 1999a;Awais et al, 2021;Chou and Elrod, 1999b;Rout et al, 2022;Naseer et al, 2021;Manavalan and Patra, 2022;Shoombuatong et al, 2022).…”

Section: Feature Encodingmentioning

confidence: 99%

Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition

Deng

et al. 2023

Front. Genet.

View full text Add to dashboard Cite

Introduction: Apoptosis proteins play an important role in the process of cell apoptosis, which makes the rate of cell proliferation and death reach a relative balance. The function of apoptosis protein is closely related to its subcellular location, it is of great significance to study the subcellular locations of apoptosis proteins. Many efforts in bioinformatics research have been aimed at predicting their subcellular location. However, the subcellular localization of apoptotic proteins needs to be carefully studied.Methods: In this paper, based on amphiphilic pseudo amino acid composition and support vector machine algorithm, a new method was proposed for the prediction of apoptosis proteins\x{2019} subcellular location.Results and Discussion: The method achieved good performance on three data sets. The Jackknife test accuracy of the three data sets reached 90.5%, 93.9% and 84.0%, respectively. Compared with previous methods, the prediction accuracies of APACC_SVM were improved.

show abstract

“…We have compiled relevant research conducted by researchers in recent years and compared 11 models. In the process of feature extraction, there are many aspects of feature extraction, such as amino acid composition, protein secondary structure information, and physical and chemical properties of protein sequences, which play an important role in the identification of antioxidant proteins 5–7 . In the process of feature selection, we should not only select feature combinations with high contribution but also consider that the dimension of features should not be too high 8–10 .…”

Section: Introductionmentioning

confidence: 99%

“…In the process of feature extraction, there are many aspects of feature extraction, such as amino acid composition, protein secondary structure information, and physical and chemical properties of protein sequences, which play an important role in the identification of antioxidant proteins. [5][6][7] In the process of feature selection, we should not only select feature combinations with high contribution but also consider that the dimension of features should not be too high. [8][9][10] A dimension that is too high will affect not only the efficiency of the model but also the accuracy of the model due to redundant features.…”

Section: Introductionmentioning

confidence: 99%

Machine learning‐based antioxidant protein identification model: Progress and evaluation

Meng,

Pei,

et al. 2023

J of Cellular Biochemistry

View full text Add to dashboard Cite

Efficient and accurate identification of antioxidant proteins is of great significance. In recent years, many models for identifying antioxidant proteins have been proposed, but the low sensitivity and high dimensionality of the models are common problems. The generalization ability of the model needs to be improved. Researchers have tried different feature extraction algorithms and feature selection algorithms to obtain the most effective feature combination and have chosen more appropriate classification algorithms and tools to improve model performance. In this article, we systematically reviewed the data set of the most frequently used antioxidant proteins and the method selection for each step of model establishment and discussed the characteristics of each method. We have conducted a detailed analysis of recent research and believe that the practical ability and efficiency of model application can be improved by reducing model dimensions. The key to improving the performance of antioxidant protein recognition models in the future may lie in feature selection, so this paper also focuses on the combination of feature extraction and selection steps in the analysis of the model building process.

show abstract

iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins

Cited by 21 publications

References 20 publications

ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree

ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree

Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition

Machine learning‐based antioxidant protein identification model: Progress and evaluation

Contact Info

Product

Resources

About