Abstract. Groundwater is crucial for domestic supplies in the Sahel, where the
strategic importance of aquifers will increase in the coming years due to
climate change. Groundwater potential mapping is a valuable tool to underpin
water management in the region and, hence, to improve drinking water access.
This paper presents a machine learning method to map groundwater potential.
This is illustrated through its application in two administrative regions of
Mali. A set of explanatory variables for the presence of groundwater is
developed first. Scaling methods (standardization, normalization, maximum
absolute value and max–min scaling) are used to avoid the pitfalls
associated with reclassification. Noisy, collinear and counterproductive
variables are identified and excluded from the input dataset. A total of 20 machine
learning classifiers are then trained and tested on a large borehole
database (n=3345) in order to find meaningful correlations between the
presence or absence of groundwater and the explanatory variables. Maximum
absolute value and standardization proved the most efficient scaling
techniques, while tree-based algorithms (accuracy >0.85)
consistently outperformed other classifiers. The borehole flow rate data were
then used to calibrate the results beyond standard machine learning metrics,
thereby adding robustness to the predictions. The southern part of the study
area presents the better groundwater prospect, which is consistent with
the geological and climatic setting. Outcomes lead to three major
conclusions: (1) picking the best performers out of a large number of
machine learning classifiers is recommended as a good methodological
practice, (2) standard machine learning metrics should be complemented with
additional hydrogeological indicators whenever possible and (3) variable
scaling contributes to minimize expert bias.
Abstract. Groundwater is crucial for domestic supplies in the Sahel, where the strategic importance of aquifers can only be expected to increase in the coming years due to climate change. Groundwater potential mapping is gaining recognition as a valuable tool to underpin water management practices in the region, and hence, to improve water access. This paper presents a machine learning method to map groundwater potential and illustrates it through an application to two regions of Mali. A set of explanatory variables for the presence of groundwater is developed first. Several scaling methods (standardization, normalization, maximum absolute value and min-max scaling) are used to avoid the pitfalls associated with the reclassification of explanatory variables. A number of supervised learning classifiers is then trained and tested on a large borehole database (n = 3,345) in order to find meaningful correlations between the presence or absence of groundwater and the explanatory variables. This process identifies noisy, collinear and counterproductive variables and excludes them from the input dataset. Tree-based algorithms, including the AdaBoost, Gradient Boosting, Random Forest, Decision Tree and Extra Trees classifiers were found to outperform other algorithms on a consistent basis (accuracy > 0.85), whereas maximum absolute value and standardization proved the most efficient methods to scale explanatory variables. Borehole flow rate data is used to calibrate the results beyond standard machine learning metrics, thus adding robustness to the predictions. The southern part of the study area was identified as the better groundwater prospect, which is consistent with the geological and climatic setting. From a methodological standpoint, the outcomes lead to three major conclusions: (1) because there is no aprioristic way to know which algorithm will work better on a given dataset, we advocate the use of a large number of machine learning classifiers, out of which the best are subsequently picked for ensembling; (2) standard machine learning metrics may be of limited value when appraising map outcomes, and should be complemented with hydrogeological indicators whenever possible; and (3) the scaling of the variables helps to minimize bias arising from expert judgement and maintains robust predictive capabilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.