Numerical models are frequently used for the regional quantification of groundwater recharge. However there is a wide range of potential models available that represent the land surface with varying degrees of complexity, but which are rarely tested against observations at the field scale. We compared four models that simulate potential recharge at four intensively monitored sites with different vegetation and soil types in two adjacent catchments. These models were: Penman–Grindley, UN Food and Agricultural Organization, SPAtial Distributed Evaporation and Joint UK Land Environment Simulator. Standardized, unoptimized land surface datasets and pertinent literature were used for parameterization to reflect practice in regional water resource management and planning in the UK. The models were validated against soil moisture observations at all sites, as well as observed transpiration and interception and calculated total evaporation over a year at a woodland site. Soil moisture observations were generally reproduced well, but there were significant differences in how the models apportioned precipitation through the hydrological cycle. This demonstrates that soil moisture data alone are not a good diagnostic for groundwater recharge models. Significant differences in potential recharge were produced by models at both grassland sites, although simulated average annual potential recharge varied by only 15% at the grassland site on permeable soil. At the woodland sites, soil moisture contents were reproduced least accurately, and there were large differences in potential recharge at both woodland sites. This predominantly resulted from varied and inaccurate simulation of evaporation, particularly in the form of interception losses where this was explicitly represented in models. Differences in model structure, such as runoff representation, and parameter selection also influenced all results. Hydrological Processes © 2013 John Wiley & Sons, Ltd.