This article presents a novel method based on an enhanced version of the YOLOv5 model for detecting surface defects on capsules. The paper addresses the challenge of detecting defects on transparent capsules by introducing a deep learning-based approach called M-YOLO. Firstly, the backbone layer is replaced with MobileNetV3, enhancing the model’s suitability for scenarios with limited storage space and power consumption. Secondly, a Cross-channel-H-SPP (CH-SPP) module is devised to augment the contextual information within the sensory field. To enhance defect detection accuracy, the SE attention mechanism is incorporated. Additionally, an improved label assignment strategy is employed to enhance the recall rate. Experimental results on the dataset demonstrate significant improvements in both accuracy and speed compared to the YOLOv5 model. The algorithm proposed in this article satisfies the requirement of processing every second (specific data).