2020
DOI: 10.3390/ijerph17249322
|View full text |Cite
|
Sign up to set email alerts
|

Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees

Abstract: Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure–activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…We separated the dataset into random iterations of training (90%) and test data (10%) and trained the boosted C5.0 algorithm using 100 decision trees. The algorithm had 2 main meta parameters, including number of trees and minimum number of samples (MinCases) placed in at least 2 splits (Elsayad et al 2020). We used early stopping to prevent model overfitting (Caruana et al 2000, Jabbar and Khan 2015), which reduced the final number of trees.…”
Section: Methodsmentioning
confidence: 99%
“…We separated the dataset into random iterations of training (90%) and test data (10%) and trained the boosted C5.0 algorithm using 100 decision trees. The algorithm had 2 main meta parameters, including number of trees and minimum number of samples (MinCases) placed in at least 2 splits (Elsayad et al 2020). We used early stopping to prevent model overfitting (Caruana et al 2000, Jabbar and Khan 2015), which reduced the final number of trees.…”
Section: Methodsmentioning
confidence: 99%
“…C5.0 is an algorithm based on decision trees ( Elsayad et al., 2020 ), which involve a set of decision nodes, among which the root and each internal node are labeled with a question ( Pradhan, 2013 ). The arcs descend from each root node to leaf nodes, where a solution to the associated issue is offered.…”
Section: Methodsmentioning
confidence: 99%
“…The features that do not contribute to the splits are removed from the final model. While C5 algorithms are easy to implement and interpret, it requires categorical (ordinal/nominal) data as target variable and may not work well on small datasets [ 31 , 36 ].…”
Section: Methodsmentioning
confidence: 99%