AbstractData representation and prediction model design play an important role in mid- to long-term runoff prediction. However, it is challenging to extract key factors that accurately characterize the changes in the runoff of a river basin because of the complex nature of the runoff process. In addition, the low accuracy is another problem for mid- to long-term runoff prediction. With an aim to solve these problems, two improvements are proposed in this paper. First, the partial mutual information (PMI)-based approach was employed for estimating the importance of various factors. Second, a deep learning architecture was introduced by using the deep belief network (DBN) with partial least-squares regression (PLSR), together denoted as PDBN, for mid- to long-term runoff prediction, which solves the problem of parameter optimization for the DBN using PLSR. The novelty of the proposed method lies in the key factor selection and a novel forecasting method for mid- to long-term runoff. Experimental results demonstrated that the proposed method can significantly improve the effect of mid- to long-term runoff prediction. Also, compared with the results obtained by current state-of-the-art prediction methods, i.e., DBN, backpropagation neural networks, and support vector machine models, our prediction results demonstrate the performance of the proposed method.
The accuracy of medium- and long-term runoff forecasting plays a significant role in several applications involving the management of hydrological resources, such as power generation, water supply and flood mitigation. Numerous studies that adopted combined forecasting models to enhance runoff forecasting accuracy have been proposed. Nevertheless, some models do not take into account the effects of different lag periods on the selection of input factors. Based on this, this paper proposed a novel medium- and long-term runoff combined forecasting model based on different lag periods. In this approach, the factors are initially selected by the time-delay correlation analysis method of different lag periods and further screened with stepwise regression analysis. Next, an extreme learning machine (ELM) is adopted to integrate each result obtained from the three single models, including multiple linear regression (MLR), feed-forward back propagation-neural network (FFBP-NN) and support vector regression (SVR), which is optimized by particle swarm optimization (PSO). To verify the effectiveness and versatility of the proposed combined model, the Lianghekou and Jinping hydrological stations from the Yalong River basin, China, are utilized as case studies. The experimental results indicate that compared with MLR, FFBP-NN, SVR and ridge regression (RR), the proposed combined model can better improve the accuracy of medium- and long-term runoff forecasting in the statistical indices of MAE, MAPE, RMSE, DC, U95 and reliability.
In the application of medium and long-term runoff forecasting, machine learning has some problems, such as high learning cost, limited computing cost, and difficulty in satisfying statistical data assumptions in some regions, leading to difficulty in popularization in the hydrology industry. In the case of a few data, it is one of the ways to solve the problem to analyze the data characteristics consistency. This paper analyzes the statistical hypothesis of machine learning and runoff data characteristics such as periodicity and mutation. Aiming at the effect of data characteristics inconsistency on three representative machine learning models (multiple linear regression, random forest, back propagation neural network), a simple correction/improvement method suitable for engineering was proposed. The model results were verified in the Danjiangkou area, China. The results show that the errors of the three models have the same distribution as the periodic characteristics of the runoff periods, and the correction/improvement based on periodicity and mutation characteristics can improve the forecasting accuracy of the three models. The back propagation neural network model is most sensitive to the data characteristics consistency.INDEX TERMS Danjiangkou reservoir, data characteristics consistency, machine learning, medium and long-term runoff forecasting, mutation, characteristics, periodicity characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.