Variable Selection in Time Series Forecasting Using Random Forests

Tyralis, Hristos; Papacharalampous, Georgia

doi:10.3390/a10040114

Cited by 121 publications

(62 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…is the moving average operator and [17,18]. Box and Jenkins (1970) created the building blocks of ARIMA, breaking down the prediction process into three iterative steps: identification, estimation, and validation-as seen in Figure 1 [3,19,20].…”

Section: Autoregressive Integrated Moving Average (Arima)mentioning

confidence: 99%

Influent Forecasting for Wastewater Treatment Plants in North America

Boyd

et al. 2019

Sustainability

View full text Add to dashboard Cite

Autoregressive Integrated Moving Average (ARIMA) is a time series analysis model that can be dated back to 1955. It has been used in many different fields of study to analyze time series and forecast future data points; however, it has not been widely used to forecast daily wastewater influent flow. The objective of this study is to explore the possibility for wastewater treatment plants (WWTPs) to utilize ARIMA for daily influent flow forecasting. To pursue the objective confidently, five stations across North America are used to validate ARIMA’s performance. These stations include Woodward, Niagara, North Davis, and two confidential plants. The results demonstrate that ARIMA models can produce satisfactory daily influent flow forecasts. Considering the results of this study, ARIMA models could provide the operating engineers at both municipal and rural WWTPs with sufficient information to run the stations efficiently and thus, support wastewater management and planning at various levels within a watershed.

show abstract

Section: Autoregressive Integrated Moving Average (Arima)mentioning

confidence: 99%

Influent Forecasting for Wastewater Treatment Plants in North America

Boyd

et al. 2019

Sustainability

View full text Add to dashboard Cite

show abstract

“…The RF model provides an OOB data-based unbiased estimation error for the test dataset [61]. Performance of the algorithm depends on the selected parameters, such as the number of trees [52,62], splitting at each node of each tree [60,63], and the number of examples in each cell, below which the cell is not split [38], but equals the default value of the nodesize [64]. In this study, the default value was used as recommended in the literature.…”

Section: Discussionmentioning

confidence: 99%

Landscape-Scale Aboveground Biomass Estimation in Buffer Zone Community Forests of Central Nepal: Coupling In Situ Measurements with Landsat 8 Satellite Data

2018

View full text Add to dashboard Cite

Knowledge of forest productivity status is an important indicator of the amount of biomass accumulated and the role of terrestrial ecosystems in the carbon cycle. However, accurate and up-to-date information on forest biomass and forest succession remain rudimentary within natural forests. This study sought to understand and establish the potential of a new-generation sensor in estimating aboveground biomass (AGB) stored in the natural forest, also known as ‘community forest’ or buffer zone community forest (BZCF), in the Parsa National Park, Nepal. The utility of the 30-m resolution Landsat 8 Operational Land Imager (OLI) and in situ data was tested using two statistical approaches, namely multiple linear regression (MLR) and random forest (RF). The analysis was done based on four computational procedures. These included spectral bands, vegetation indices and pooled dataset (spectral bands + vegetation indices), and model selected important variables. AGB estimation based on pooled data showed that the RF algorithm produced better results when compared to the use of the MLR model. For instance, the RF model estimated AGB with an R2 value of 0.87 and a root mean square error of 20.50 t ha−1, as well as an R2 value of 0.95 and a RMSE of 13.3 t ha−1 when using selected important variables. Comparatively, the MLR using pooled data produced an R2 value of 0.56 and RMSE value of 37.01 t ha−1. The RF model selected Optimized Soil Adjusted Vegetation index (OSAVI), Simple ratio (SR), Modified simple ratio (MSR), and Normalized difference Vegetation index (NDVI) as the most important variables for estimating AGB, whereas MLR selected band 5 and SR. These findings demonstrate the relevance of the relatively new Landsat 8 sensor in the estimation of AGB in community buffer zones.

show abstract

“…S1 (see Supplement). The seasonality pattern is obvious in the sample autocorrelation function (ACF) of the original time series and reduced in the sample ACF of the deseasonalized time series, while the estimates of the Hurst parameter (H ) of the Hurst-Kolmogorov process (for its definition see Supplement; see also Tyralis et al, 2018), when the latter is fitted to the deseasonalized time series as described in Tyralis and Koutsoyiannis (2011), have a median value of 0.75 and, therefore, indicate significant long-range dependence. We note that the parameter H is commonly used in the literature for measuring this dependence under the established assumption that the latter is present in the various geophysical processes.…”

Section: Methodsmentioning

confidence: 99%

Large-scale assessment of Prophet for multi-step ahead forecasting of monthly streamflow

Tyralis

Papacharalampous

2018

Adv. Geosci.

Self Cite

View full text Add to dashboard Cite

Abstract. We assess the performance of the recently introduced Prophet model in multi-step ahead forecasting of monthly streamflow by using a large dataset. Our aim is to compare the results derived through two different approaches. The first approach uses past information about the time series to be forecasted only (standard approach), while the second approach uses exogenous predictor variables alongside with the use of the endogenous ones. The additional information used in the fitting and forecasting processes includes monthly precipitation and/or temperature time series, and their forecasts respectively. Specifically, the exploited exogenous (observed or forecasted) information considered at each time step exclusively concerns the time of interest. The algorithms based on the Prophet model are in total four. Their forecasts are also compared with those obtained using two classical algorithms and two benchmarks. The comparison is performed in terms of four metrics. The findings suggest that the compared approaches are equally useful.

show abstract

Variable Selection in Time Series Forecasting Using Random Forests

Cited by 121 publications

References 54 publications

Influent Forecasting for Wastewater Treatment Plants in North America

Influent Forecasting for Wastewater Treatment Plants in North America

Landscape-Scale Aboveground Biomass Estimation in Buffer Zone Community Forests of Central Nepal: Coupling In Situ Measurements with Landsat 8 Satellite Data

Large-scale assessment of Prophet for multi-step ahead forecasting of monthly streamflow

Contact Info

Product

Resources

About