“…It is essential to carefully evaluate the performance of different ensemble models and select the one that provides the best trade-off between bias and variance, accuracy, diversity, stability, generalization, and computational cost [67,91,92,159]. The final stage would evaluate and validate the performance of the selected ensemble model using appropriate evaluation metrics and statistical tests, such as the mean absolute error (MAE) [21,83,160], root-mean-squared error (RMSE) [106,161], correlation coefficient (CC) [49,83,106,161], and coefficient of determination (R-squared) [42,43,[161][162][163]. The following section covers some of the fundamental concepts that are considered when evaluating a neural network ensemble for storm surge prediction.…”