2020
DOI: 10.21203/rs.3.rs-132775/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Determining Threshold Value on Information Gain Feature Selection to Increase Speed and Prediction Accuracy of Random Forest

Abstract: Feature selection is a preprocessing technique aims to remove the unnecessary features and speed up the algorithm's work process. One of the feature selection techniques is by calculating the information gain value of each feature in a dataset. From the information gain value obtained, then the determined threshold value will be used to make feature selection. Generally, the threshold value is used freely, or using a value of 0.05. This study proposed the determination of the threshold value using the standard… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 26 publications
(28 reference statements)
0
3
0
Order By: Relevance
“…e information gain rate is the information gain divided by the amount of divided information [28,29]. For the training data set S, it consists of s samples.…”
Section: C45 Algorithm: Inductive Learning Mechanism Ismentioning
confidence: 99%
“…e information gain rate is the information gain divided by the amount of divided information [28,29]. For the training data set S, it consists of s samples.…”
Section: C45 Algorithm: Inductive Learning Mechanism Ismentioning
confidence: 99%
“…To obtain the most appropriate words for predictive modeling, we applied the filter method to select features based on information gain (IG) [34,35]. If words having IG scores are greater than or equal to 0.05 [34], these words are chosen as features. Each selected word (or feature) is then weighted by the term frequency-inverse document frequency (tf-idf) scheme.…”
Section: Figure 1 Methods Of Developing the Polarity Label Analyzermentioning
confidence: 99%
“…Furthermore, to make the finest attribute set, we need to use a cut-off value to pick the attribute set from the final ranked list captured after the aggregation procedure. In this study, three different threshold values were utilized to minimize the data and to pick the best appropriate attribute set [23]. The threshold values used are:…”
Section: Threshold Valuesmentioning
confidence: 99%