With the development of the smart grid, the traditional defect detection methods in transmission lines are gradually shifted to the combination of robots or drones and deep learning technology to realize the automatic detection of defects, avoiding the risks and computational costs of manual detection. Lightweight embedded devices such as drones and robots belong to small devices with limited computational resources, while deep learning mostly relies on deep neural networks with huge computational resources. And semantic features of deep networks are richer, which are also critical for accurately classifying morphologically similar defects for detection, helping to identify differences and classify transmission line components. Therefore, we propose a method to obtain advanced semantic features even in shallow networks. Combined with transfer learning, we change the image features (e.g., position and edge connectivity) under self-supervised learning during pre-training. This allows the pre-trained model to learn potential semantic feature representations rather than relying on low-level features. The pre-trained model then directs a shallow network to extract rich semantic features for downstream tasks. In addition, we introduce a category semantic fusion module (CSFM) to enhance feature fusion by utilizing channel attention to capture global and local information lost during compression and extraction. This module helps to obtain more category semantic information. Our experiments on a self-created transmission line defect dataset show the superiority of modifying low-level image information during pre-training when adjusting the number of network layers and embedding of the CSFM. The strategy demonstrates generalization on the publicly available PASCAL VOC dataset. Finally, compared with state-of-the-art methods on the synthetic fog insulator dataset (SFID), the strategy achieves comparable performance with much smaller network depths.