Radio Frequency Interference (RFI) detection and characterization play a critical role to in ensuring the security of all wireless communication networks. Advances in Machine Learning (ML) have led to the deployment of many robust techniques dealing with various types of RFI. To sidestep an unavoidable complicated feature extraction step in ML, this paper proposes an efficient end-to-end method using the latest advances in deep learning to extract the appropriate features of the RFI signal. Moreover, this study utilizes the benefits of transfer learning to determine both the type of received RFI signals and their modulation types. To this end, the scalogram of the received signals is used as the input of the pre-trained convolutional neural networks (CNN), followed by a fully-connected classifier. This study considers a digital video stream as the signal of interest (SoI), transmitted in a real-time satellite-to-ground communication using DVB-S2 standards. To create the RFI dataset, the SoI is combined with three well-known jammers namely, continuous-wave interference (CWI), multi- continuous-wave interference (MCWI), and chirp interference (CI). This study investigated four well-known pre-trained CNN architectures, namely, AlexNet, VGG-16, GoogleNet, and ResNet-18, for the feature extraction to recognize the visual RFI patterns directly from pixel images with minimal preprocessing. Moreover, the robustness of the proposed classifiers is evaluated by the data generated at different signal to noise ratios (SNR).