Estimation of CO2 flux components over northern hemisphere forest ecosystems by using random forest method through temporal and spatial data scanning procedures
“…RF is an ensemble technique that utilizes multiple decision trees trained through bootstrap aggregating [51]. RF offers the advantage of generating reasonable predictions without requiring hyper-parameter tuning and mitigating overfitting issues commonly observed in decision trees [52][53][54]. To ensure the universality of RF models, the historical datasets, including observed streamflow datasets of global runoff data center (GRDC), GCMs datasets from Coupled Model Intercomparison Project Phase 6 (CMIP6) and, Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) are divided into two subsets by random sampling: The subset containing 70% of the data is used to calibrate the model, and the subset containing the remaining 30% data is used for validation.…”
In the context of global climate warming, the propagation of meteorological drought (MD) may aggravate the devastating impact of hydrological drought (HD) on water security and sustainable development. There are challenges in accurately predicting the propagation of drought and effectively quantifying the effects of uncertainty, especially in datadeficient regions. In this study, a novel method called RFCFA is developed through integrating random forest (RF), copula, and factorial analysis (FA) into a general framework as well as applied to the Aral Sea Basin (a typical arid and data-scarce basin in Central Asia) under considering the impact of climate change. Several findings can be summarized: (1) the projected future drought propagation probability of ASB is 39.2%, which is about 8% higher than historical level; (2) drought propagation is mainly affected by mean climate condition, catchment characteristics (i.e., elevation, LUCC, and slope), and human activities (i.e., irrigation and reservoir operation); (3) the lower propagation probability in spring is expected under SSP1-2.6 due to increased snow meltwater, and the drought propagation probability in autumn is the highest (reaching 45.4%) under the influence of reservoir operation; (4) the combined effects of meteorological conditions and agricultural irrigation can lead to a higher probability of future propagation in the upper river basin in summer. Findings are valuable for predicting drought propagation risk, revealing main factors and inherent uncertainties, as well as providing support for drought management and disaster prevention.
“…RF is an ensemble technique that utilizes multiple decision trees trained through bootstrap aggregating [51]. RF offers the advantage of generating reasonable predictions without requiring hyper-parameter tuning and mitigating overfitting issues commonly observed in decision trees [52][53][54]. To ensure the universality of RF models, the historical datasets, including observed streamflow datasets of global runoff data center (GRDC), GCMs datasets from Coupled Model Intercomparison Project Phase 6 (CMIP6) and, Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) are divided into two subsets by random sampling: The subset containing 70% of the data is used to calibrate the model, and the subset containing the remaining 30% data is used for validation.…”
In the context of global climate warming, the propagation of meteorological drought (MD) may aggravate the devastating impact of hydrological drought (HD) on water security and sustainable development. There are challenges in accurately predicting the propagation of drought and effectively quantifying the effects of uncertainty, especially in datadeficient regions. In this study, a novel method called RFCFA is developed through integrating random forest (RF), copula, and factorial analysis (FA) into a general framework as well as applied to the Aral Sea Basin (a typical arid and data-scarce basin in Central Asia) under considering the impact of climate change. Several findings can be summarized: (1) the projected future drought propagation probability of ASB is 39.2%, which is about 8% higher than historical level; (2) drought propagation is mainly affected by mean climate condition, catchment characteristics (i.e., elevation, LUCC, and slope), and human activities (i.e., irrigation and reservoir operation); (3) the lower propagation probability in spring is expected under SSP1-2.6 due to increased snow meltwater, and the drought propagation probability in autumn is the highest (reaching 45.4%) under the influence of reservoir operation; (4) the combined effects of meteorological conditions and agricultural irrigation can lead to a higher probability of future propagation in the upper river basin in summer. Findings are valuable for predicting drought propagation risk, revealing main factors and inherent uncertainties, as well as providing support for drought management and disaster prevention.
“…The net ecosystem carbon exchange (NEE) is the key carbon flux component within terrestrial ecosystems and plays an essential role in a better understanding of the global carbon cycle and land–atmosphere interaction (Shiri et al, 2022). Accurate estimation and validation of NEE of the terrestrial ecosystems in regions or globally are of great significance in evaluating the function of the regional carbon source and sink.…”
Section: Introductionmentioning
confidence: 99%
“…The ongoing efforts of the FLUXNET community and continuous improvement of the spatiotemporal resolution of remote sensing data have encouraged the application of the data‐driven machine learning (ML) method such as the random forest (RF, Shiri et al, 2022), artificial neural networks (ANNs, Evrendilek, 2014), support vector regression (SVR, Ichii et al, 2017), cubist (Xiao et al, 2008; Xiao et al, 2011) or model trees ensemble (MTE, Jung et al, 2009, 2011) to estimate the terrestrial ecosystems' carbon dioxide, water and energy fluxes from a site scale to the regional and global scale (Xiao et al, 2019). The accuracy of the ML model is generally better than linear regression, ecosystem model, remote sensing inversion and other model methods, which has been proved in the application research of related geosciences (Reichstein et al, 2019).…”
The eddy covariance (EC) flux stations have great limitations in the evaluation of the global net ecosystem carbon exchange (NEE) and in the uncertainty reduction due to their sparse and uneven distribution and spatial representation. If the EC stations are linked with widely distributed meteorological stations using machine learning (ML) and remote sensing, it will play a big role in effectively improving the accuracy of the global NEE assessment and reducing uncertainty.
In this study, we developed a framework for estimating NEE at meteorological stations. We first optimized the hyperparameters and input variables of the ML model based on the optimization method called an adaptive genetic algorithm. Then, we developed 566 random forest (RF)‐based NEE estimation models by the strategy of spatial leave‐out‐one cross‐validation. We innovatively established the Euclidean distance‐based accuracy projection algorithm of the R square (R2), which could test the accuracy of each model to estimate the NEE of the specific flux at the weather station. Only the model with the highest R2 was selected from the models with a prediction accuracy of R2 > 0.5 for the specific meteorological stations to estimate its NEE.
4674 out of 10,289 weather stations around the world might match at least one of the 566 NEE estimation models with a projected accuracy of R2 > 0.5. The NEE estimation models we screened for the meteorological stations showed a reliable performance and a higher accuracy than the former studies. The NEE values of the most (96.9%) screened meteorological stations around the world are negative (carbon sink) and most (65.3%) of those showed an increasing trend in the mean annual NEE (carbon sink).
The NEE dataset produced at the meteorological stations could be used as a supplement to the EC observations and quasi‐observation data to assess the NEE products of the global grid. The NEE dataset is publicly available via the figshare with https://doi.org/10.6084/m9.figshare.20485563.v1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.