2017
DOI: 10.1080/13658816.2017.1346255
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the prediction performance of spatial models via spatial k-fold cross validation

Abstract: In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes the standard cross validation (CV) methods to produce optimistically biased prediction performance estimates for spatial models, which can result in increased costs … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
78
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 117 publications
(87 citation statements)
references
References 37 publications
(35 reference statements)
2
78
0
Order By: Relevance
“…To estimate a model's prediction performance where the effect of SAC has been reduced, Pohjankukka et al (2014Pohjankukka et al ( , 2017 and Le Rest et al (2014) proposed spatial cross validation (SCV) to be used for this purpose. The idea in SCV is to estimate a model's prediction performance for a test point r units away from the closest known instances.…”
Section: Spatial Bias and Scvmentioning
confidence: 99%
See 2 more Smart Citations
“…To estimate a model's prediction performance where the effect of SAC has been reduced, Pohjankukka et al (2014Pohjankukka et al ( , 2017 and Le Rest et al (2014) proposed spatial cross validation (SCV) to be used for this purpose. The idea in SCV is to estimate a model's prediction performance for a test point r units away from the closest known instances.…”
Section: Spatial Bias and Scvmentioning
confidence: 99%
“…This is done by altering the data in the CV procedure, so that a test point will always be at least r units away from the training data. Following Pohjankukka et al (2017), we call this left out area the dead zone. SCV produces a prediction performance estimate of our model as a function of r , i.e.…”
Section: Spatial Bias and Scvmentioning
confidence: 99%
See 1 more Smart Citation
“…The most sophisticated validation sampling techniques (hold-out and k-fold) assume data in both the test and training sets to be independent of each other. This is an assumption that may be unrealistic with datasets containing SAC, especially if the purpose of the modelling is for interpolation or close proximity extrapolation (Pohjankukka et al 2017). As such, four sampling techniques are considered, three of which consider spatial dependence for comparison (see 'data sampling' in Figure 3):…”
Section: Data Sampling For Cross-validationmentioning
confidence: 99%
“…(2) spatially stratified 10-fold cross-validation (spatially stratified k-fold crossvalidation [SSKCV]) on the full dataset of 3669; (3) chequerboard holdout on a training set of 1832 properties, with a test set of 1837 properties; (4) spatial k-fold cross-validation (SKCV) (Pohjankukka et al 2017) on samples of the entire dataset, with each sample including 3187 properties AE 135 for each fold.…”
Section: Data Sampling For Cross-validationmentioning
confidence: 99%