Safety helmets play a vital role in protecting workers’ heads. In order to improve the accuracy of the detection model in complex environments, such as complex backgrounds and different lighting and distances, we propose a safety helmet-wearing detection algorithm based on the improved YOLOv7. In the backbone network, 16-channel features are used to replace 3-channel RGB features. Structured pruning is performed in the head network, and the loss function is replaced by SIoU. Experiments on the “helmet-head,” “helmet-data,” and “helmet” data sets show that the mAP and F1 of YOLOv7_ours improved in this paper are better than Faster RCNN, YOLOv5, and YOLOv7 series models. On image data of different application scenarios, light intensity, and color depth, YOLOv7_ours has better stability and higher accuracy and can detect at 112.4FPS (1000/8.9). Based on the improved YOLOv7_ours, we integrated face recognition technology and text-to-speech (TTS) to realize helmet detection, identity recognition, and automatic voice reminder capabilities and developed a safety helmet-wearing detection prototype system. We verified the feasibility of the helmet detection algorithm and system in the semifinished product manufacturing workshop.