Background: Automatic anlysis of endoscopic images will played an important role in the future spine robotic surgery. The study is designed as a translational study to develop AI models of semantic segmentation for spinal endoscopic instruments and anatomic structures. The aim is to provide the visual understanding basis of endoscopic images for future intelligent robotic surgery. Methods: An estimate of 500 cases of endoscopic video will be included in the study. More data may also be included from the internet for external validation. Video clip containing typical spinal endoscopic instruments and distinct anatomic structures will be extracted. Typical spinal endoscopic instruments will include forceps, bipolar electrocoagulation, drill and so on. Endoscopic anatomic structures will include ligament, upper lamina, lower lamina, nerve root, disc, adipofascia, etc. The ratio of training, validation and testing set of included samples is initially set as 8: 1: 1. State-of-art algorithm (namely UNet, Swin-UNet, DeepLab-V3, etc) and self-developed deep learning algorithm will be used to develop the sementic segmentation models. Dice coefficient (DC), Hausdorff distance (HD), and mean surface distance (MSD) will be used to assess the segmentation performance. Discussions: This protocol firstly proposed the research plans to develop deep learning models to achieve multi-task semantic segmentation of spinal endoscopy images. Automatically recognizing and simultaneously contouring the surgical instruments and anatomic structures will teach the robot understand the surgical procedures of human surgeons. The research results and the annotated data will be disclosed and published in the near future.