Aiming at the transmission defect business scenario with many components, various defect forms and uneven size distribution, this paper proposes a transmission defect identification algorithm based on the VIT pre-trained visual large model architecture and the ViTDet object detection training algorithm. Specifically, it uses the ViT-Large model as the backbone network and Cascade-rcnn as the framework of the ViTDet algorithm. Meanwhile, in order to solve the problem of small-size defect identification in transmission scenes with large field of view, the image clipping training strategy is integrated. Cut each image equally into four parts with an overlap rate of 20% during training. In the case of the similar false detection, the recognition rate of the large model has an improvement of about 5% compared to the traditional CNN model.