Assessing the Effect of Training Sampling Design on the Performance of Machine Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine

Shetty, Shobitha; Gupta, P. K.; Belgiu, Mariana; Srivastav, S. K.

doi:10.3390/rs13081433

Cited by 65 publications

(37 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In most methods, it is not possible to obtain accuracy on the training set, but only on the validation and test set. In many publications the accuracy of validation (OA) is reported, which in almost all cases is above 80% (e.g., SV M = 97.7% [7], SV M = 98.96% [20], SV Mmodi f ied = 98.07% [13], RF = 93% [22], RF = 86.98% [23], RF = 83.96% [39], Dynamic Time Warping algorithm, NDVI time series classification = 72-89%, multi-band classification = 76-88% [40]). In some cases, the accuracy for test data is also delivered: 84.2% [7], 88.94% [20], which means in the case of 13.5% [7] less value than the accuracy of the validation and in the case of 10.02% [20] lower.…”

Section: Discussionmentioning

confidence: 99%

Reliable Crops Classification Using Limited Number of Sentinel-2 and Sentinel-1 Images

et al. 2021

View full text Add to dashboard Cite

The study presents the analysis of the possible use of limited number of the Sentinel-2 and Sentinel-1 to check if crop declarations that the EU farmers submit to receive subsidies are true. The declarations used in the research were randomly divided into two independent sets (training and test). Based on the training set, supervised classification of both single images and their combinations was performed using random forest algorithm in SNAP (ESA) and our own Python scripts. A comparative accuracy analysis was performed on the basis of two forms of confusion matrix (full confusion matrix commonly used in remote sensing and binary confusion matrix used in machine learning) and various accuracy metrics (overall accuracy, accuracy, specificity, sensitivity, etc.). The highest overall accuracy (81%) was obtained in the simultaneous classification of multitemporal images (three Sentinel-2 and one Sentinel-1). An unexpectedly high accuracy (79%) was achieved in the classification of one Sentinel-2 image at the end of May 2018. Noteworthy is the fact that the accuracy of the random forest method trained on the entire training set is equal 80% while using the sampling method ca. 50%. Based on the analysis of various accuracy metrics, it can be concluded that the metrics used in machine learning, for example: specificity and accuracy, are always higher then the overall accuracy. These metrics should be used with caution, because unlike the overall accuracy, to calculate these metrics, not only true positives but also false positives are used as positive results, giving the impression of higher accuracy. Correct calculation of overall accuracy values is essential for comparative analyzes. Reporting the mean accuracy value for the classes as overall accuracy gives a false impression of high accuracy. In our case, the difference was 10–16% for the validation data, and 25–45% for the test data.

show abstract

Section: Discussionmentioning

confidence: 99%

Reliable Crops Classification Using Limited Number of Sentinel-2 and Sentinel-1 Images

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Not as much research is done into determining how data sampling strategies affect ML classifiers. The authors in [101] compared different data sampling strategies and their effects on how different ML classifiers performed on LULC tasks. A multi-seasonal sample set was collected in [88] for global land cover mapping in 2015 from Landsat 8 images.…”

Section: Land Cover Classificationmentioning

confidence: 99%

Google Earth Engine and Artificial Intelligence (AI): A Comprehensive Review

et al. 2022

View full text Add to dashboard Cite

Remote sensing (RS) plays an important role gathering data in many critical domains (e.g., global climate change, risk assessment and vulnerability reduction of natural hazards, resilience of ecosystems, and urban planning). Retrieving, managing, and analyzing large amounts of RS imagery poses substantial challenges. Google Earth Engine (GEE) provides a scalable, cloud-based, geospatial retrieval and processing platform. GEE also provides access to the vast majority of freely available, public, multi-temporal RS data and offers free cloud-based computational power for geospatial data analysis. Artificial intelligence (AI) methods are a critical enabling technology to automating the interpretation of RS imagery, particularly on object-based domains, so the integration of AI methods into GEE represents a promising path towards operationalizing automated RS-based monitoring programs. In this article, we provide a systematic review of relevant literature to identify recent research that incorporates AI methods in GEE. We then discuss some of the major challenges of integrating GEE and AI and identify several priorities for future research. We developed an interactive web application designed to allow readers to intuitively and dynamically review the publications included in this literature review.

show abstract

“…The random forest is an ensemble method specially designed for a decision tree classifier, and the selection of random attributes is further added to its training process. Using similar parameters to those used for the decision tree, the random forest model is easy to implement and shows good effects [32,33]. In this research, parameters are determined by using cross-validation and grid search methods.…”

Section: Random Forestmentioning

confidence: 99%

“…(2) Machine learning methods: These methods feature pixel-based pattern recognition analysis, mainly including supervised and unsupervised classification techniques. The supervised methods mainly include neural network [21][22][23][24][25], support vector machine (SVM) [26][27][28], logistic regression [29,30], and random forest [31][32][33], and the unsupervised classification methods mainly include K-means clustering [34] and ISODATA clustering [35,36] methods. The machine learning algorithm has been widely used in remote sensing water extraction due to its high accuracy.…”

Section: Introductionmentioning

confidence: 99%

Comparative Analysis of Machine Learning Algorithms in Automatic Identification and Extraction of Water Boundaries

Fan

Qin

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

Monitoring open water bodies accurately is important for assessing the role of ecosystem services in the context of human survival and climate change. There are many methods available for water body extraction based on remote sensing images, such as the normalized difference water index (NDWI), modified NDWI (MNDWI), and machine learning algorithms. Based on Landsat-8 remote sensing images, this study focuses on the effects of six machine learning algorithms and three threshold methods used to extract water bodies, evaluates the transfer performance of models applied to remote sensing images in different periods, and compares the differences among these models. The results are as follows. (1) Various algorithms require different numbers of samples to reach their optimal consequence. The logistic regression algorithm requires a minimum of 110 samples. As the number of samples increases, the order of the optimal model is support vector machine, neural network, random forest, decision tree, and XGBoost. (2) The accuracy evaluation performance of each machine learning on the test set cannot represent the local area performance. (3) When these models are directly applied to remote sensing images in different periods, the AUC indicators of each machine learning algorithm for three regions all show a significant decline, with a decrease range of 0.33–66.52%, and the differences among the different algorithm performances in the three areas are obvious. Generally, the decision tree algorithm has good transfer performance among the machine learning algorithms with area under curve (AUC) indexes of 0.790, 0.518, and 0.697 in the three areas, respectively, and the average value is 0.668. The Otsu threshold algorithm is the optimal among threshold methods, with AUC indexes of 0.970, 0.617, and 0.908 in the three regions respectively and an average AUC of 0.832.

show abstract

Assessing the Effect of Training Sampling Design on the Performance of Machine Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine

Cited by 65 publications

References 52 publications

Reliable Crops Classification Using Limited Number of Sentinel-2 and Sentinel-1 Images

Reliable Crops Classification Using Limited Number of Sentinel-2 and Sentinel-1 Images

Google Earth Engine and Artificial Intelligence (AI): A Comprehensive Review

Comparative Analysis of Machine Learning Algorithms in Automatic Identification and Extraction of Water Boundaries

Contact Info

Product

Resources

About