Hristiana Stoynova scite author profile

Remote sensing, or Earth Observation (EO), is increasingly used to understand Earth system dynamics and create continuous and categorical maps of biophysical properties and land cover, especially based on recent advances in machine learning (ML). ML models typically require large, spatially explicit training datasets to make accurate predictions. Training data (TD) are typically generated by digitizing polygons on high spatial-resolution imagery, by collecting in situ data, or by using pre-existing datasets. TD are often assumed to accurately represent the truth, but in practice almost always have error, stemming from (1) sample design, and (2) sample collection errors. The latter is particularly relevant for image-interpreted TD, an increasingly commonly used method due to its practicality and the increasing training sample size requirements of modern ML algorithms. TD errors can cause substantial errors in the maps created using ML algorithms, which may impact map use and interpretation. Despite these potential errors and their real-world consequences for map-based decisions, TD error is often not accounted for or reported in EO research. Here we review the current practices for collecting and handling TD. We identify the sources of TD error, and illustrate their impacts using several case studies representing different EO applications (infrastructure mapping, global surface flux estimates, and agricultural monitoring), and provide guidelines for minimizing and accounting for TD errors. To harmonize terminology, we distinguish TD from three other classes of data that should be used to create and assess ML models: training reference data, used to assess the quality of TD during data generation; validation data, used to iteratively improve models; and map reference data, used only for final accuracy assessment. We focus primarily on TD, but our advice is generally applicable to all four classes, and we ground our review in established best practices for map accuracy assessment literature. EO researchers should start by determining the tolerable levels of map error and appropriate error metrics. Next, TD error should be minimized during sample design by choosing a representative spatio-temporal collection strategy, by using spatially and temporally relevant imagery and ancillary data sources during TD creation, and by selecting a set of legend definitions supported by the data. Furthermore, TD error can be minimized during the collection of individual samples by using consensus-based collection strategies, by directly comparing interpreted training observations against expert-generated training reference data to derive TD error metrics, and by providing image interpreters with thorough application-specific training. We strongly advise that TD error is incorporated in model outputs, either directly in bias and variance estimates or, at a minimum, by documenting the sources and implications of error. TD should be fully documented and made available via an open TD repository, allowing others to replicate and assess its use. To guide researchers in this process, we propose three tiers of TD error accounting standards. Finally, we advise researchers to clearly communicate the magnitude and impacts of TD error on map outputs, with specific consideration given to the likely map audience.

show abstract

Quantification of Urban Forest and Grassland Carbon Fluxes Using Field Measurements and a Satellite‐Based Model in Washington DC/Baltimore Area

Winbourne

Smith

Stoynova

et al. 2022

JGR Biogeosciences

View full text Add to dashboard Cite

Cities are taking the lead on climate change mitigation with ambitious goals to reduce carbon dioxide (CO2) emissions. The implementation of effective mitigation policies will require accurate measurements to guide policy decisions and monitor their efficacy. Here, we present a comprehensive CO2 inventory of an urban temperate forest and unmanaged grassland using field observations. We estimate the annual storage of CO2 by vegetation and soils and place our biogenic flux estimates in the context of local fossil fuel (FF) emissions to determine when, where, and by how much biogenic fluxes alter net CO2 flux dynamics. We compare our hourly estimates of biogenic fluxes in the forest site to modeled estimates using a modified version of Urban‐Vegetation Photosynthesis and Respiration Model (Urban‐VPRM) in Washington DC/Baltimore area presenting the first urban evaluation of this model. We estimate that vegetation results in a net biogenic uptake of −2.62 ± 1.9 Mg C ha−1 yr−1 in the forest site. FF emissions, however, drive patterns in the net flux resulting in the region being a net source of CO2 on daily and annual timescales. In the summer afternoons, however, the net flux is dominated by the uptake of CO2 by vegetation. The Urban‐VPRM closely approximates hourly forest inventory based estimates of gross ecosystem exchange but overestimates ecosystem respiration in the dormant season by 40%. Our study highlights the importance of including seasonal dynamics in biogenic CO2 fluxes when planning and testing the efficacy of CO2 emission reduction polices and development of monitoring programs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hristiana Stoynova

Accounting for Training Data Error in Machine Learning Applied to Earth Observations

Quantification of Urban Forest and Grassland Carbon Fluxes Using Field Measurements and a Satellite‐Based Model in Washington DC/Baltimore Area

Contact Info

Product

Resources

About