Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees

Elsayad, Alaa M.; Nassef, Ahmed M.; Al‐Dhaifallah, Mujahed; Elsayad, Khaled A.

doi:10.3390/ijerph17249322

Cited by 8 publications

(5 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We separated the dataset into random iterations of training (90%) and test data (10%) and trained the boosted C5.0 algorithm using 100 decision trees. The algorithm had 2 main meta parameters, including number of trees and minimum number of samples (MinCases) placed in at least 2 splits (Elsayad et al 2020). We used early stopping to prevent model overfitting (Caruana et al 2000, Jabbar and Khan 2015), which reduced the final number of trees.…”

Section: Methodsmentioning

confidence: 99%

Structural complexity characterizes fine‐scale forest conditions used by Pacific martens

et al. 2023

View full text Add to dashboard Cite

When wildlife species exhibit unexpected associations with vegetation, replication of studies in different locales can illuminate whether patterns of use are consistent or divergent. Our objective was to describe fine-scale forest conditions used by Pacific martens (Martes caurina) at 2 study sites in northern California that differed in forest composition and past timber harvest. We identified denning and resting locations of radio-marked martens and sampled structure-and plot-level vegetation using standardized forest inventory methods between 2009-2021. Woody structures used by martens were significantly larger than randomly available structures across types (e.g., live tree, snag, log) and at both study sites. Den and rest structures occurred in areas characterized by higher numbers of logs and snags, lower numbers of live trees and stumps, larger diameter live trees and logs, and

show abstract

Section: Methodsmentioning

confidence: 99%

Structural complexity characterizes fine‐scale forest conditions used by Pacific martens

et al. 2023

View full text Add to dashboard Cite

show abstract

“…C5.0 is an algorithm based on decision trees ( Elsayad et al., 2020 ), which involve a set of decision nodes, among which the root and each internal node are labeled with a question ( Pradhan, 2013 ). The arcs descend from each root node to leaf nodes, where a solution to the associated issue is offered.…”

Section: Methodsmentioning

confidence: 99%

Near-infrared spectroscopy for early selection of waxy cassava clones via seed analysis

et al. 2023

View full text Add to dashboard Cite

Cassava (Manihot esculenta Crantz) starch consists of amylopectin and amylose, with its properties determined by the proportion of these two polymers. Waxy starches contain at least 95% amylopectin. In the food industry, waxy starches are advantageous, with pastes that are more stable towards retrogradation, while high-amylose starches are used as resistant starches. This study aimed to associate near-infrared spectrophotometry (NIRS) spectra with the waxy phenotype in cassava seeds and develop an accurate classification model for indirect selection of plants. A total of 1127 F2 seeds were obtained from controlled crosses performed between 77 F1 genotypes (wild-type, Wx_). Seeds were individually identified, and spectral data were obtained via NIRS using a benchtop NIRFlex N-500 and a portable SCiO device spectrometer. Four classification models were assessed for waxy cassava genotype identification: k-nearest neighbor algorithm (KNN), C5.0 decision tree (CDT), parallel random forest (parRF), and eXtreme Gradient Boosting (XGB). Spectral data were divided between a training set (80%) and a testing set (20%). The accuracy, based on NIRFlex N-500 spectral data, ranged from 0.86 (parRF) to 0.92 (XGB). The Kappa index displayed a similar trend as the accuracy, considering the lowest value for the parRF method (0.39) and the highest value for XGB (0.71). For the SCiO device, the accuracy (0.88−0.89) was similar among the four models evaluated. However, the Kappa index was lower than that of the NIRFlex N-500, and this index ranged from 0 (parRF) to 0.16 (KNN and CDT). Therefore, despite the high accuracy these last models are incapable of correctly classifying waxy and non-waxy clones based on the SCiO device spectra. A confusion matrix was performed to demonstrate the classification model results in the testing set. For both NIRS, the models were efficient in classifying non-waxy clones, with values ranging from 96−100%. However, the NIRS differed in the potential to predict waxy genotype class. For the NIRFlex N-500, the percentage ranged from 30% (parRF) to 70% (XGB). In general, the models tended to classify waxy genotypes as non-waxy, mainly SCiO. Therefore, the use of NIRS can perform early selection of cassava seeds with a waxy phenotype.

show abstract

“…The features that do not contribute to the splits are removed from the final model. While C5 algorithms are easy to implement and interpret, it requires categorical (ordinal/nominal) data as target variable and may not work well on small datasets [ 31 , 36 ].…”

Section: Methodsmentioning

confidence: 99%

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

2021

View full text Add to dashboard Cite

Background Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). Methods From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. Results SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. Conclusions Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.

show abstract

Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees

Cited by 8 publications

References 35 publications

Structural complexity characterizes fine‐scale forest conditions used by Pacific martens

Structural complexity characterizes fine‐scale forest conditions used by Pacific martens

Near-infrared spectroscopy for early selection of waxy cassava clones via seed analysis

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Contact Info

Product

Resources

About