In the era of deep learning, the methods for gender recognition from face images achieve remarkable performance over most of the standard datasets. However, the common experimental analyses do not take into account that the face images given as input to the neural networks are often affected by strong corruptions not always represented in standard datasets. In this paper, we propose an experimental framework for gender recognition “in the wild”. We produce a corrupted version of the popular LFW+ and GENDER-FERET datasets, that we call LFW+C and GENDER-FERET-C, and evaluate the accuracy of nine different network architectures in presence of specific, suitably designed, corruptions; in addition, we perform an experiment on the MIVIA-Gender dataset, recorded in real environments, to analyze the effects of mixed image corruptions happening in the wild. The experimental analysis demonstrates that the robustness of the considered methods can be further improved, since all of them are affected by a performance drop on images collected in the wild or manually corrupted. Starting from the experimental results, we are able to provide useful insights for choosing the best currently available architecture in specific real conditions. The proposed experimental framework, whose code is publicly available, is general enough to be applicable also on different datasets; thus, it can act as a forerunner for future investigations.