In communication systems, there are many tasks, like modulation recognition, which rely on Deep Neural Networks (DNNs) models. However, these models have been shown to be susceptible to adversarial perturbations, namely imperceptible additive noise crafted to induce misclassification. This raises questions about the security but also the general trust in model predictions. We propose to use adversarial training, which consists of fine-tuning the model with adversarial perturbations, to increase the robustness of automatic modulation recognition (AMC) models. We show that current state-of-the-art models benefit from adversarial training, which mitigates the robustness issues for some families of modulations. We use adversarial perturbations to visualize the features learned, and we found that in robust models the signal symbols are shifted towards the nearest classes in constellation space, like maximum likelihood methods. This confirms that robust models not only are more secure, but also more interpretable, building their decisions on signal statistics that are relevant to modulation recognition.