Face gender recognition has many useful applications in human–robot interactions as it can improve the overall user experience. Support vector machines (SVM) and convolutional neural networks (CNNs) have been used successfully in this domain. Researchers have shown an increased interest in comparing and combining different feature extraction paradigms, including deep-learned features, hand-crafted features, and the fusion of both features. Related research in face gender recognition has been mostly restricted to limited comparisons of the deep-learned and fused features with the CNN model or only deep-learned features with the CNN and SVM models. In this work, we perform a comprehensive comparative study to analyze the classification performance of two widely used learning models (i.e., CNN and SVM), when they are combined with seven features that include hand-crafted, deep-learned, and fused features. The experiments were performed using two challenging unconstrained datasets, namely, Adience and Labeled Faces in the Wild. Further, we used T-tests to assess the statistical significance of the differences in performances with respect to the accuracy, f-score, and area under the curve. Our results proved that SVMs showed best performance with fused features, whereas CNN showed the best performance with deep-learned features. CNN outperformed SVM significantly at p < 0.05.