Scenarios of flood inundation are traditionally simulated by numerically solving the partial differential equations (PDEs) that govern fluid dynamics, with the initial and boundary conditions derived from observational data. While proven highly valuable for improving flood management and risk mitigation (Karim et al., 2023), such hydrodynamic simulations are limited in its applicability in near-real-time or emergency management due to its high demand on time and hardware resources.Drastically improved efficiency can result from leveraging the power of deep artificial neural networks (ANNs). In a recently proposed methodology known as physics-informed machine learning (PIML, Karniadakis et al., 2021), ANNs are trained to obtain numerical solutions based on the PDEs and limited observational data. A more mature and realistic category of methods, on the other hand, employs ANNs to emulate traditional hydrodynamic simulators. That is, an ANN is trained to predict the output of a physicsbased simulator given its inputs. This category, often referred to as surrogate models, includes algorithms that predict flood inundation at specific future time points, as well as those that generate series of predictions.Compared with PIML, a key concern regarding ANN-based surrogate models is their generalizability. That is, if the simulation data used to train the surrogate model are insufficiently diverse (in terms of the covered ranges of scenarios and conditions) or inadequately representative, whether the model is able to effectively learn the underlying physics in the absence of physics-based constraints.Recently, we proposed an ANN-based framework for rapid inundation modelling in large regions (Tychsen-Smith, et al., 2023). This framework takes into account the local elevation, surface roughness, river inflows and the current water heights, and predicts the water heights at the subsequent time point. The prediction is performed at grid points in a rolling forward fashion. In this paper we focus on case studies on the generalizability of this framework. Specifically, we aim to answer the following questions: How well does a model based on our framework perform on unseen floods in the same location? How well does a model that was trained in one location perform in a different location? How much can the performance on unseen floods in unseen locations improve with an additional simulation in an additional location? Finally, we propose measures aimed at improving the generalizability of an ANN-based model.