On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping

Millard, Koreen; Richardson, Murray

doi:10.3390/rs70708489

Cited by 467 publications

(399 citation statements)

References 35 publications

Supporting

Mentioning

325

Contrasting

Unclassified

Order By: Relevance

“…However, the training data should be distributed evenly in geographic space to avoid generating spurious classification accuracies (Friedl, Brodley, and Strahler 1999), which is not possible in this study as the MODIS active fire detections are sparsely distributed. In addition, unlike supervised land cover classification approaches, where training sample points inherently have high spatial autocorrelation due to the way they are collected (Egorov et al 2015;Millard and Richardson 2015), the training data in this study were derived from a random subset of the MODIS active fire detections and therefore are less likely to be spatially autocorrelated. As observed in similar regional fire-related random forest studies (Archibald et al 2009;Oliveira et al 2012), spatial autocorrelation of predictor variables may occur due to various physical and biological processes but no technique to incorporate spatial dependence has been reliably demonstrated and this remains an area of active research.…”

Section: Discussionmentioning

confidence: 99%

Multi-year MODIS active fire type classification over the Brazilian Tropical Moist Forest Biome

Roy

Kumar

2016

International Journal of Digital Earth

View full text Add to dashboard Cite

The Brazilian Tropical Moist Forest Biome (BTMFB) spans almost 4 million km 2 and is subject to extensive annual fires that have been categorized into deforestation, maintenance, and forest fire types. Information on fire types is important as they have different atmospheric emissions and ecological impacts. A supervised classification methodology is presented to classify the fire type of MODerate resolution Imaging Spectroradiometer (MODIS) active fire detections using training data defined by consideration of Brazilian government forest monitoring program annual land cover maps, and using predictor variables concerned with fuel flammability, fuel load, fire behavior, fire seasonality, fire annual frequency, proximity to surface transportation, and local temperature. The fire seasonality, local temperature, and fuel flammability were the most influential on the classification. Classified fire type results for all 1.6 million MODIS Terra and Aqua BTMFB active fire detections over eight years (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) are presented with an overall fire type classification accuracy of 90.9% (kappa 0.824). The fire type user's and producer's classification accuracies were respectively 92.4% and 94.4% (maintenance fires), 88.4% and 87.5% (forest fires), and, 88.7% and 75.0% (deforestation fires). The spatial and temporal distribution of the classified fire types are presented and are similar to patterns reported in the available recent literature. ARTICLE HISTORY

show abstract

Section: Discussionmentioning

confidence: 99%

Multi-year MODIS active fire type classification over the Brazilian Tropical Moist Forest Biome

Roy

Kumar

2016

International Journal of Digital Earth

View full text Add to dashboard Cite

show abstract

“…OOB error was used as a basis for comparison of classifications to determine optimum input parameters, years and seasons (Table 4) as described below. OOB error has been shown to be optimistic compared to independent sample validation accuracy [75,80], but when applied consistently in the same manner, it can be an efficient way to compare classifications and conduct variable selection. It was preferred over independent validation for this study given: (1) the field generated reference data set sample size was limited due to poor accessibility to all parts of the wetlands; and (2) both the field and image-based reference samples follow the general arcuate shape of the wetlands and were probably spatially auto-correlated.…”

Section: Image Classificationmentioning

confidence: 99%

“…Although RF is generally considered robust to overfitting [106], it is highly likely that overall classification accuracy levels reported in this study were overly optimistic since the RF accuracy assessment was obtained from the "Out-of-Bag" (OOB) accuracy estimate, which is known to represent inflated accuracy [75,80]. OOB accuracy is useful for comparison of multiple classification models as in this study but independent validation is required to determine absolute accuracy.…”

Section: Limitations and Recommendations For Future Mapping Of Wetlandsmentioning

confidence: 99%

“…In addition, as a result of the Dabus Marsh arcuate configuration in the landscape, the spatial distribution of the reference data shows a highly-clustered pattern with expected spatial autocorrelation, which was not measured. Such clustering of reference sample locations can contribute to classification accuracy inflation [80,108]; however, this could not be avoided. Finally, the field surveys represented less than 5-10% of the total area, and were mainly concentrated in the northwestern portion of the Dabus Wetlands.…”

Section: Limitations and Recommendations For Future Mapping Of Wetlandsmentioning

confidence: 99%

See 1 more Smart Citation

Mapping the Dabus Wetlands, Ethiopia, Using Random Forest Classification of Landsat, PALSAR and Topographic Data

et al. 2017

View full text Add to dashboard Cite

Abstract:The Dabus Wetland complex in the highlands of Ethiopia is within the headwaters of the Nile Basin and is home to significant ecological communities and rare or endangered species. Its many interrelated wetland types undergo seasonal and longer-term changes due to weather and climate variations as well as anthropogenic land use such as grazing and burning. Mapping and monitoring of these wetlands has not been previously undertaken due primarily to their relative isolation and lack of resources. This study investigated the potential of remote sensing based classification for mapping the primary vegetation groups in the Dabus Wetlands using a combination of dry and wet season data, including optical (Landsat spectral bands and derived vegetation and wetness indices), radar (ALOS PALSAR L-band backscatter), and elevation (SRTM derived DEM and other terrain metrics) as inputs to the non-parametric Random Forest (RF) classifier. Eight wetland types and three terrestrial/upland classes were mapped using field samples of observed plant community composition and structure groupings as reference information. Various tests to compare results using different RF input parameters and data types were conducted. A combination of multispectral optical, radar and topographic variables provided the best overall classification accuracy, 94.4% and 92.9% for the dry and wet season, respectively. Spectral and topographic data (radar data excluded) performed nearly as well, while accuracies using only radar and topographic data were 82-89%. Relatively homogeneous classes such as Papyrus Swamps, Forested Wetland, and Wet Meadow yielded the highest accuracies while spatially complex classes such as Emergent Marsh were more difficult to accurately classify. The methods and results presented in this paper can serve as a basis for development of long-term mapping and monitoring of these and other non-forested wetlands in Ethiopia and other similar environmental settings.

show abstract

“…2018, 10, 50 3 of 25 intensive for computation. Moreover, a machine learning algorithm-random forests (RF) has been widely used for classification of LULC types [21][22][23][24][25][26][27]. This method has the ability of optimizing both classification results and selection of remote sensing variables.…”

Section: Introductionmentioning

confidence: 99%

Improving Selection of Spectral Variables for Vegetation Classification of East Dongting Lake, China, Using a Gaofen-1 Image

et al. 2017

View full text Add to dashboard Cite

Abstract:There is a large amount of remote sensing data available for land use and land cover (LULC) classification and thus optimizing selection of remote sensing variables is a great challenge. Although many methods such as Jeffreys-Matusita (JM) distance and random forests (RF) have been developed for this purpose, the existing methods ignore correlation and information duplication among remote sensing variables. In this study, a novel approach was proposed to improve the measures of potential class separability for the selection of remote sensing variables by taking into account correlations among the variables. The proposed method was examined with a total of thirteen spectral variables from a Gaofen-1 image, three class separability measures including JM distance, transformed divergence and B-distance and three classifiers including Bayesian discriminant (BD), Mahalanobis distance (MD) and RF for classification of six LULC types at the East Dongting Lake of Hunan, China. The results showed that (1) The proposed approach selected the first three spectral variables and resulted in statistically stable classification accuracies for three improved class separability measures. That is, the classification accuracies using three or more spectral variables statistically did not significantly differ from each other at a significant level of 0.05; (2) The statistically stable classification accuracies obtained by integrating MD and BD classifiers with the improved class separability measures were also statistically not significantly different from those by RF; (3) The numbers of the selected spectral variables using the improved class separability measures to create the statistically stable classification accuracies by MD and BD classifiers were much smaller than those from the original class separability measures and RF; and (4) Three original class separability measures and RF led to similar ranks of importance of the spectral variables, while the ranks achieved by the improved class separability measures were different due to the consideration of correlations among the variables. This indicated that the proposed method more effectively and quickly selected the spectral variables to produce the statistically stable classification accuracies compared with the original class separability measures and RF and thus improved the selection of the spectral variables for the classification.

show abstract

On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping

Cited by 467 publications

References 35 publications

Multi-year MODIS active fire type classification over the Brazilian Tropical Moist Forest Biome

Multi-year MODIS active fire type classification over the Brazilian Tropical Moist Forest Biome

Mapping the Dabus Wetlands, Ethiopia, Using Random Forest Classification of Landsat, PALSAR and Topographic Data

Improving Selection of Spectral Variables for Vegetation Classification of East Dongting Lake, China, Using a Gaofen-1 Image

Contact Info

Product

Resources

About