2018
DOI: 10.1038/s41598-018-28244-w
|View full text |Cite
|
Sign up to set email alerts
|

Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)

Abstract: Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in different subsets of a major reference soil groups (RSG) – the Plinthosols – would lead to an increase in prediction accuracy of the minor soil groups by using Random Forest (RF). This method was compared to the random ove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
27
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 48 publications
(30 citation statements)
references
References 79 publications
(84 reference statements)
2
27
1
Order By: Relevance
“…Therefore, the trained ML algorithms on the ROS resampled data could not precisely be generalized to the unseen data. This is one reason why we obtained a poor performance of ML algorithms using ROS resampled data, which is in line with the findings of Hounkpatin et al (), who pointed out the poor power of generalization of RF models trained by ROS resampled data. Contrary to our findings, Sharififar, Sarmadian, Malone, and Minasny () indicated a significant improvement in ML learning when they made balanced soil data using ROS.…”
Section: Resultssupporting
confidence: 92%
See 3 more Smart Citations
“…Therefore, the trained ML algorithms on the ROS resampled data could not precisely be generalized to the unseen data. This is one reason why we obtained a poor performance of ML algorithms using ROS resampled data, which is in line with the findings of Hounkpatin et al (), who pointed out the poor power of generalization of RF models trained by ROS resampled data. Contrary to our findings, Sharififar, Sarmadian, Malone, and Minasny () indicated a significant improvement in ML learning when they made balanced soil data using ROS.…”
Section: Resultssupporting
confidence: 92%
“…Generally speaking, the performances of ML algorithms trained on the original imbalanced dataset indicated that the two ensemble‐based models (RF and XGBoost) show high and fairly similar accuracy (Table ). Our findings are in line with results of several DSM literature reviews, which all confirmed the power of ensemble‐based models compared to the other common ML algorithms (Brungard et al, ; Hounkpatin et al, ). However, a closer inspection of the calculated recall values of individual classes revealed that a considerable number of minority classes will be misclassified as majority classes, such as Chernozems, Phaeozems and Solonetz (Table ).…”
Section: Resultssupporting
confidence: 91%
See 2 more Smart Citations
“…Firstly, soil sample locations were identified according to the number of samples in the respective soil polygon map of each upazila. Then, the samples were matched individually within the grids following objective selection [35] based on physiography, land types and soil series.…”
Section: Soil Legacy Data Cropping Intensity and Fertilizermentioning
confidence: 99%