In recent years, the use of deep learning methods has rapidly increased in many research fields. Similarly, they have become a powerful tool within the climate scientific community. Deep learning methods have been successfully applied for different tasks, such as the identification of atmospheric patterns, weather extreme classification, or weather forecasting. However, due to the inherent complexity of atmospheric processes, the ability of deep learning models to simulate natural processes, particularly in the case of weather extremes, is still challenging. Therefore, a thorough evaluation of their performance and robustness in predicting precipitation fields is still needed, especially for extreme precipitation events, which can have devastating consequences in terms of infrastructure damage, economic losses, and even loss of life. In this study, we present a comprehensive evaluation of a set of deep learning architectures to simulate precipitation, including heavy precipitation events ( > 95th percentile) and extreme events ( > 99th percentile) over the European domain. Among the architectures analyzed here, the U‐Net network was found to be superior and outperformed the other networks in simulating precipitation events. In particular, we found that a simplified version of the original U‐Net with two encoder‐decoder levels generally achieved similar skill scores than deeper versions for predicting precipitation extremes, while significantly reducing the overall complexity and computing resources. We further assess how the model predicts through the attribution heatmaps from a Layer‐wise Relevance Propagation (LRP) explainability method.