Modeling surface water quality using soft computing techniques is essential for the effective management of scarce water resources and environmental protection. The development of accurate predictive models with significant input parameters and inconsistent datasets is still a challenge. Therefore, further research is needed to improve the performance of the predictive models. This study presents a methodology for dataset pre-processing and input optimization for reducing the modeling complexity. The objective of this study was achieved by employing a two-sided detection approach for outlier removal and an exhaustive search method for selecting essential modeling inputs. Thereafter, the adaptive neuro-fuzzy inference system (ANFIS) was applied for modeling electrical conductivity (EC) and total dissolved solids (TDS) in the upper Indus River. A larger dataset of a 30-year historical period, measured monthly, was utilized in the modeling process. The prediction capacity of the developed models was estimated by statistical assessment indicators. Moreover, the 10-fold cross-validation method was carried out to address the modeling overfitting issue. The results of the input optimization indicate that Ca2+, Na+, and Cl− are the most relevant inputs to be used for EC. Meanwhile, Mg2+, HCO3−, and SO42− were selected to model TDS levels. The optimum ANFIS models for the EC and TDS data showed R values of 0.91 and 0.92, and the root mean squared error (RMSE) results of 30.6 µS/cm and 16.7 ppm, respectively. The optimum ANFIS structure comprises a hybrid training algorithm with 27 fuzzy rules of triangular fuzzy membership functions for EC and a Gaussian curve for TDS modeling, respectively. Evidently, the outcome of the present study reveals that the ANFIS modeling, aided with data pre-processing and input optimization, is a suitable technique for simulating the quality of surface water. It could be an effective approach in minimizing modeling complexity and elaborating proper management and mitigation measures.
Water pollution is an increasing global issue that societies are facing and is threating human health, ecosystem functions and agriculture production. The distinguished features of artificial intelligence (AI) based modeling can deliver a deep insight pertaining to rising water quality concerns. The current study investigates the predictive performance of gene expression programming (GEP), artificial neural network (ANN) and linear regression model (LRM) for modeling monthly total dissolved solids (TDS) and specific conductivity (EC) in the upper Indus River at two outlet stations. In total, 30 years of historical water quality data, comprising 360 TDS and EC monthly records, were used for models training and testing. Based on a significant correlation, the TDS and EC modeling were correlated with seven input parameters. Results were evaluated using various performance measure indicators, error assessment and external criteria. The simulated outcome of the models indicated a strong association with actual data where the correlation coefficient above 0.9 was observed for both TDS and EC. Both the GEP and ANN models remained the reliable techniques in predicting TDS and EC. The formulated GEP mathematical equations depict its novelty as compared to ANN and LRM. The results of sensitivity analysis indicated the increasing trend of input variables affecting TDS as HCO3− (22.33%) > Cl− (21.66%) > Mg2+ (16.98%) > Na+ (14.55%) > Ca2+ (12.92%) > SO42− (11.55%) > pH (0%), while, in the case of EC, it followed the trend as HCO3− (42.36%) > SO42−(25.63%) > Ca2+ (13.59%) > Cl− (12.8%) > Na+ (5.01%) > pH (0.61%) > Mg2+ (0%). The parametric analysis revealed that models have incorporated the effect of all the input parameters in the modeling process. The external assessment criteria confirmed the generalized outcome and robustness of the proposed approaches. Conclusively, the outcomes of this study demonstrated that the formulation of AI based models are cost effective and helpful for river water quality assessment, management and policy making.
The capillary length (λs) and time (ts) are dynamic scalars that emerge routinely in the infiltration problem when gravitational and pressure gradients forces are involved. During drainage, however, capillary gradients oppose gravity and retain soil moisture close to surface. In this case, the pull of capillary gradients increases with drainage time and offsets gravity resulting in a quasi‐hydrostatic pressure distribution and negligibly small drainage flux in the profile. In this paper, it is proposed to anchor the dynamic concept of field capacity—the attainment of a small negligible drainage flux—in the physics of soil moisture redistribution as influenced by gravity and capillary forces. Similar to infiltration, this dynamic approach grounds the concept of field capacity in soil hydrology and allows its estimation from readily measured intrinsic physical characteristics such as λs, ts, and Ks. Finally, we exploit an analytical solution by Broadbridge and White (1988, https://doi.org/10.1029/WR024i001p00145) to track the drainage front as soil water redistributes in an initially saturated soil profile. While initially large, the downward migrating drainage front decelerates with time reaching near steady state condition at t ≈ 1,000ts. Quasi‐hydrostatic pressure matric head and water content profiles develop above the drainage front.
The prediction accuracies of machine learning (ML) models may not only be dependent on the input parameters and training dataset, but also on whether an ensemble or individual learning model is selected. The present study is based on the comparison of individual supervised ML models, such as gene expression programming (GEP) and artificial neural network (ANN), with that of an ensemble learning model, i.e., random forest (RF), for predicting river water salinity in terms of electrical conductivity (EC) and dissolved solids (TDS) in the Upper Indus River basin, Pakistan. The projected models were trained and tested by using a dataset of seven input parameters chosen on the basis of significant correlation. Optimization of the ensemble RF model was achieved by producing 20 sub-models in order to choose the accurate one. The goodness-of-fit of the models was assessed through well-known statistical indicators, such as the coefficient of determination (R2), mean absolute error (MAE), root mean squared error (RMSE), and Nash–Sutcliffe efficiency (NSE). The results demonstrated a strong association between inputs and modeling outputs, where R2 value was found to be 0.96, 0.98, and 0.92 for the GEP, RF, and ANN models, respectively. The comparative performance of the proposed methods showed the relative superiority of the RF compared to GEP and ANN. Among the 20 RF sub-models, the most accurate model yielded the R2 equal to 0.941 and 0.938, with 70 and 160 numbers of corresponding estimators. The lowest RMSE values of 1.37 and 3.1 were yielded by the ensemble RF model on training and testing data, respectively. The results of the sensitivity analysis demonstrated that HCO3− is the most effective variable followed by Cl− and SO42− for both the EC and TDS. The assessment of the models on external criteria ensured the generalized results of all the aforementioned techniques. Conclusively, the outcome of the present research indicated that the RF model with selected key parameters could be prioritized for water quality assessment and management.
Water contamination is indeed a worldwide problem that threatens public health, environmental protection, and agricultural productivity. The distinctive attributes of machine learning (ML)-based modelling can provide in-depth understanding into increasing water quality challenges. This study presents the development of a multi-expression programming (MEP) based predictive model for water quality parameters, i.e., electrical conductivity (EC) and total dissolved solids (TDS) in the upper Indus River at two different outlet locations using 360 readings collected on a monthly basis. The optimized MEP models were assessed using different statistical measurements i.e., coefficient-of-determination (R2), root-mean-square error (RMSE), mean-absolute error (MAE), root-mean-square-logarithmic error (RMSLE) and mean-absolute-percent error (MAPE). The results show that the R2 in the testing phase (subjected to unseen data) for EC-MEP and TDS-MEP models is above 0.90, i.e., 0.9674 and 0.9725, respectively, reflecting the higher accuracy and generalized performance. Also, the error measures are quite lower. In accordance with MAPE statistics, both the MEP models shows an “excellent” performance in all three stages. In comparison with traditional non-linear regression models (NLRMs), the developed machine learning models have good generalization capabilities. The sensitivity analysis of the developed MEP models with regard to the significance of each input on the forecasted water quality parameters suggests that Cl and HCO3 have substantial impacts on the predictions of MEP models (EC and TDS), with a sensitiveness index above 0.90, although the influence of the Na is the less prominent. The results of this research suggest that the development of intelligence models for EC and TDS are cost effective and viable for the evaluation and monitoring of the quality of river water.
One of the most valuable approaches in spatial analysis for a better understanding of the hydrological response of a region or a watershed is certainly the analysis of the well-known land use land cover (LULC) dynamicity. The present case study delves deeper into the analysis of LULC dynamicity by using digital Landsat TM and Landsat OLI data to classify the Kolkata Metropolitan Development Authority (KMDA) into seven classes with over 90% classification accuracy for decadal level assessments of 30 years (for the years 1989, 1999, 2009, and 2019). The change index, the Dematel method for analyzing the cause-effect relationship among the LULC classes, the Jaccard Similarity Index for measuring the nature of similarity among the LULC classes, and the Adherence Index for measuring the consistency of the LULC classes after the transition was used in this study to analyze the LULC transformation. In more detail, the present study considers how urban land use is altering at the expense of other land uses. Besides the shifting pattern of mean centers of the LULC classes through time, also gives a very significant insight into the LULC dynamics over 30 years of span. The current study of LULC dynamicity and transformation patterns over the 30 years of the KMDA area is expected to assist land and urban planners, engineers, and administrators in sustainable decisions and policies to ensure inclusive urbanization that accommodates population growth while minimizing the impact on potential natural resources within the whole study area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.