Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest

Prasetiyowati, Maria Irmina; Maulidevi, Nur Ulfa; Surendro, Kridanto

doi:10.1186/s40537-021-00472-4

Cited by 48 publications

(39 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This study followed previous studies ( Prasetiyowati, Maulidevi & Surendro, 2021 ; Prasetiyowati, Maulidevi & Surendro, 2020a ; Prasetiyowati, Maulidevi & Surendro, 2020b ). The researchers began this study by using the Correlation-based Feature Selection (CBF) for feature selection.…”

Section: Introductionmentioning

confidence: 72%

“…Random Forest is a classification algorithm based on the random selection of trees ( Gounaridis & Koukoulas, 2016 ; Prasetiyowati, Maulidevi & Surendro, 2020a ; Prasetiyowati, Maulidevi & Surendro, 2021 ), thereby making it uninformative as a tool used to build the decision tree ( Breiman, 2001 ; Prasetiyowati, Maulidevi & Surendro, 2021 ; Scornet, Biau & Vert, 2015 ). However, this process allows the selected feature to be uninformative.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, improving the feature selection process is necessary to make it informative with a faster execution time. Several studies have proposed the feature selection process for Random Forest ( Adnan, 2014 ; Prasetiyowati, Maulidevi & Surendro, 2021 ; Sun et al, 2020 ; Ye et al, 2013 ; Zhang & Suganthan, 2014 ), including the use of IG with a threshold based on the standard deviation value ( Prasetiyowati, Maulidevi & Surendro, 2021 ). Zhang & Suganthan (2014) proposed a new method in Random Forest by increasing tree diversity by combining a different rotation space at the root node.…”

Section: Introductionmentioning

confidence: 99%

“…Several other studies use the calculation of the frequency of each feature to determine the threshold value as a subset of the final features ( Tsai & Sung, 2020 ). However, some also use the standard deviation to determine the threshold ( Prasetiyowati, Maulidevi & Surendro, 2021 ; Sindhu & Radha, 2020 ).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy

Prasetiyowati

Maulidevi

Surendro

2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest.

show abstract

Section: Introductionmentioning

confidence: 72%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy

Prasetiyowati

Maulidevi

Surendro

2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…ML can select significant predictors and exclude collinear variables, whereas unsupervised ML uses all the predictors with the same weights. Weights of RSA traits affected ML models in numerous other studies [ 80 – 83 ]. In this study, we segmented root crowns and used RhizoVision Explorer to extract root traits for use in these models.…”

Section: Discussionmentioning

confidence: 99%

Objective Phenotyping of Root System Architecture Using Image Augmentation and Machine Learning in Alfalfa (Medicago sativa L.)

York

Seethepalli

et al. 2022

Plant Phenomics

View full text Add to dashboard Cite

Active breeding programs specifically for root system architecture (RSA) phenotypes remain rare; however, breeding for branch and taproot types in the perennial crop alfalfa is ongoing. Phenotyping in this and other crops for active RSA breeding has mostly used visual scoring of specific traits or subjective classification into different root types. While image-based methods have been developed, translation to applied breeding is limited. This research is aimed at developing and comparing image-based RSA phenotyping methods using machine and deep learning algorithms for objective classification of 617 root images from mature alfalfa plants collected from the field to support the ongoing breeding efforts. Our results show that unsupervised machine learning tends to incorrectly classify roots into a normal distribution with most lines predicted as the intermediate root type. Encouragingly, random forest and TensorFlow-based neural networks can classify the root types into branch-type, taproot-type, and an intermediate taproot-branch type with 86% accuracy. With image augmentation, the prediction accuracy was improved to 97%. Coupling the predicted root type with its prediction probability will give breeders a confidence level for better decisions to advance the best and exclude the worst lines from their breeding program. This machine and deep learning approach enables accurate classification of the RSA phenotypes for genomic breeding of climate-resilient alfalfa.

show abstract

Machine Learning Algorithms for Prediction of COVID-19 in Early Stages Using Explainable AI Approach

Kaur,

Singh,

Hans

et al. 2024

Algorithms for Intelligent Systems

View full text Add to dashboard Cite

Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest

Cited by 48 publications

References 40 publications

The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy

The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy

Objective Phenotyping of Root System Architecture Using Image Augmentation and Machine Learning in Alfalfa (Medicago sativa L.)

Machine Learning Algorithms for Prediction of COVID-19 in Early Stages Using Explainable AI Approach

Contact Info

Product

Resources

About