The establishment of reliable water level prediction models is vital for urban flood control and planning. In this paper, we develop hybrid models (GA-XGBoost and DE-XGBoost) that couple two evolutionary models, a genetic algorithm (GA) and a differential evolution (DE) algorithm, with the extreme gradient boosting (XGBoost) model for hourly water level prediction. The Jungrang urban basin located on the Han River, South Korea, was selected as a case study for the proposed models. Hourly rainfall and water level data were collected between 2003 and 2020 to construct and evaluate the performance of the selected models. To compare the prediction efficiency, two other tree-based models were chosen: classification and registration tree (CART) and random forest (RF) models. A comparison of the results showed that two hybrid models, GA-XGBoost and DE-XGBoost, outperformed RF and CART in the multistep-ahead prediction of water level, and the relative errors of the hybrid model ranged from [2.18%-9.21%], compared to [3.76%-10.41%] and [2.99%-11.88%] for the RF and CART, respectively. Reliable performance was also supported by other measures. In general, the GA-XGBoost and DE-XGBoost models displayed relatively similar performance despite their small differences. The CART model was not preferable for multistep-ahead water level predictions, even though it yielded the lowest Akaike information criterion (AIC) value. This study verifies that despite having some drawbacks when considering long step-ahead prediction and model complexity, hybrid XGBoost models might be superior to many existing models for hourly water level prediction.