Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variability in the data would not be fittingly anticipated. As a result, this study compared the performance of three imputation methods in predicting recurrence in streamflow datasets: robust random regression imputation (RRRI), k-nearest neighbours (k-NN), and classification and regression tree (CART). Furthermore, entire historical daily streamflow data from 2012 to 2014 (as training dataset) were utilised to assess and validate the effectiveness of the imputation methods in addressing missing streamflow data. Following that, all three methods coupled with multiple linear regression (MLR), were used to restore streamflow rates in Malaysia's Langat River Basin from 1978 to 2016. The estimation techniques effectiveness was evaluated using metrics inclusive of the Nash-Sutcliffe efficiency coefficient (CE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The results confirmed that RRRI coupled with MLR (RRRI-MLR) had the lowest RMSE and MAPE values, outperforming all other techniques tested for filling missing data in daily streamflow datasets. This indicates that the RRRI-MLR is the best method for dealing with missing data in streamflow datasets. Doi: 10.28991/cej-2021-03091747 Full Text: PDF
For more than 25 years, the Department of Environment (DOE) of Malaysia has implemented a water quality index (WQI) that uses six key water quality parameters: dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD), pH, ammoniacal nitrogen (AN), and suspended solids (SS). Water quality analysis is an essential component of water resources management that must be properly managed to prevent ecological damage from pollution and to ensure compliance with environmental regulations. This increases the need to define an efficient method for WQI analysis. One of the major challenges with the current calculation of the WQI is that it requires a series of sub-index calculations that are time consuming, complex, and prone to error. In addition, the WQI cannot be calculated if one or more water quality parameters are missing. In this study, the optimization method of WQI was developed to address the complexity of the current process. The potential of data-driven modeling, i.e., Support Vector Machine (SVM) based on Nu-Radial basis function with 10-fold cross-validation, was developed and explored to improve the prediction of WQI in Langat watershed. A thorough sensitivity analysis under six scenarios was also conducted to determine the efficiency of the model in WQI prediction. In the first scenario, the model SVM-WQI showed exceptional ability to replicate the DOE-WQI and obtained statistical results at a very high level (correlation coefficient, r > 0.95, Nash Sutcliffe efficiency, NSE >0.88, Willmott’s index of agreement, WI > 0.96). In the second scenario, the modeling process showed that the WQI can be estimated without any of the six parameters. It can be seen that the parameter DO is the most important factor in determining the WQI. The pH is the factor that affects the WQI the least. Moreover, scenarios three to six show the efficiency of the model in terms of time and cost by minimizing the number of variables in the input combination of the model (r > 0.6, NSE >0.5 (good), WI > 0.7 (very good)). In summary, the model will greatly improve and accelerate data-driven decision making in water quality management by making data more accessible and attractive without human intervention.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.