2019
DOI: 10.1111/ejss.12909
|View full text |Cite
|
Sign up to set email alerts
|

A note on knowledge discovery and machine learning in digital soil mapping

Abstract: In digital soil mapping, machine learning (ML) techniques are being used to infer a relationship between a soil property and the covariates. The information derived from this process is often translated into pedological knowledge. This mechanism is referred to as knowledge discovery. This study shows that knowledge discovery based on ML must be treated with caution. We show how pseudo‐covariates can be used to accurately predict soil organic carbon in a hypothetical case study. We demonstrate that ML methods c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
47
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 77 publications
(62 citation statements)
references
References 21 publications
(26 reference statements)
1
47
0
Order By: Relevance
“…variables that are indicative for potential drivers of soil organic C, and having a wide coverage of these potential drivers in the training data facilitates the interpretability of Random Forest models. & Random Forests can only reveal associations, not causality (Wadoux et al 2020).…”
Section: Limits Of Random Forest Algorithms In Explaining Soil Organic Cmentioning
confidence: 99%
“…variables that are indicative for potential drivers of soil organic C, and having a wide coverage of these potential drivers in the training data facilitates the interpretability of Random Forest models. & Random Forests can only reveal associations, not causality (Wadoux et al 2020).…”
Section: Limits Of Random Forest Algorithms In Explaining Soil Organic Cmentioning
confidence: 99%
“…It is possible to assess the importance of each covariate by shuffling the values of that covariate amongst the observation locations and calculating the reduction in prediction accuracy. However, Schmidt et al (2020) and Wadoux et al (2020) advise caution when inferring causal relationships from random forest models. Wadoux et al (2020) demonstrate that photographs of soil scientists projected across their study area can be utilised by a random forest to accurately map the soil carbon content.…”
Section: Machine Learningmentioning
confidence: 99%
“…Opening the "black box" is then necessary but not straightforward (see next section on interpretability), and is often reduced to the analysis of which environmental covariates are the most often used by the model to make a prediction (see for example Mahmoudabadi et al (2017) that ML algorithm should not be used for obtaining new soil knowledge because the ML algorithm aims at predicting a pattern rather than finding causal relationships. Wadoux et al (2019c) suggest to use calibrated ML models as a "hypothesis discovery" tool, in which the mechanisms conveyed by the calibrated ML model are supplied to the researcher for possible explanations of the soil process, which can then be confronted to experiments and principles of soil genesis. The challenge that then arises, noticed by Gahegan (2019) is the conversion of the mechanisms of the ML model (the model "knowledge") from a data language to a human one.…”
Section: Machine Learning and Pedological Knowledgementioning
confidence: 99%