Combined Global Surface Summary of Day and European Climate Assessment and Dataset daily meteorological data sets (around 9000 stations) were used to build spatio-temporal geostatistical models and predict daily air temperature at ground resolution of 1 km for the global land mass. Predictions in space and time were made for the mean, maximum, and minimum temperatures using spatio-temporal regression-kriging with a time series of Moderate Resolution Imaging Spectroradiometer (MODIS) 8 day images, topographic layers (digital elevation model and topographic wetness index), and a geometric temperature trend as covariates. The accuracy of predicting daily temperatures was assessed using leave-one-out cross validation. To account for geographical point clustering of station data and get a more representative cross-validation accuracy, predicted values were aggregated to blocks of land of size 500 × 500 km. Results show that the average accuracy for predicting mean, maximum, and minimum daily temperatures is root-mean-square error (RMSE) = ±2• C for areas densely covered with stations and between ± 2• C and ± 4• C for areas with lower station density. The lowest prediction accuracy was observed at high altitudes (> 1000 m) and in Antarctica with an RMSE around 6• C. The model and predictions were built for the year 2011 only, but the same methodology could be extended for the whole range of the MODIS land surface temperature images (2001 to today), i.e., to produce global archives of daily temperatures (a next-generation http://WorldClim.org repository) and to feed various global environmental models.
Abstract:The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008-2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1) the non-linear nature of the prediction assignment task; (2) input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3) the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS) ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R 2 values, sales ratios, mean average percentage error (MAPE), coefficient of dispersion (COD)) revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.
For many decades, kriging and deterministic interpolation techniques, such as inverse distance weighting and nearest neighbour interpolation, have been the most popular spatial interpolation techniques. Kriging with external drift and regression kriging have become basic techniques that benefit both from spatial autocorrelation and covariate information. More recently, machine learning techniques, such as random forest and gradient boosting, have become increasingly popular and are now often used for spatial interpolation. Some attempts have been made to explicitly take the spatial component into account in machine learning, but so far, none of these approaches have taken the natural route of incorporating the nearest observations and their distances to the prediction location as covariates. In this research, we explored the value of including observations at the nearest locations and their distances from the prediction location by introducing Random Forest Spatial Interpolation (RFSI). We compared RFSI with deterministic interpolation methods, ordinary kriging, regression kriging, Random Forest and Random Forest for spatial prediction (RFsp) in three case studies. The first case study made use of synthetic data, i.e., simulations from normally distributed stationary random fields with a known semivariogram, for which ordinary kriging is known to be optimal. The second and third case studies evaluated the performance of the various interpolation methods using daily precipitation data for the 2016–2018 period in Catalonia, Spain, and mean daily temperature for the year 2008 in Croatia. Results of the synthetic case study showed that RFSI outperformed most simple deterministic interpolation techniques and had similar performance as inverse distance weighting and RFsp. As expected, kriging was the most accurate technique in the synthetic case study. In the precipitation and temperature case studies, RFSI mostly outperformed regression kriging, inverse distance weighting, random forest, and RFsp. Moreover, RFSI was substantially faster than RFsp, particularly when the training dataset was large and high-resolution prediction maps were made.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.