Vision-based indoor window detection is a crucial technology of autonomous painting robots.Traditional window detection algorithms have poor immunity to illumination interference and low reliability. As a result of the inherent local characteristics of convolutional operation, deep learning detection algorithms based on CNN are limited in modeling global context information. On the basis of Faster-RCNN, an RS-RCNN (ResNet_50+Swin Transformer RCNN) object detection algorithm is designed to address this issue. In this algorithm, the ResNet_50 and Swin Transformer networks are fused as the backbone networks to extract features, followed by the adoption of the AAM_HRFPN (Attention Aggregation Module High resolution network) multi-feature fusion network and the addition of a linear attention mechanism. Using the SIoU loss calculation method, the proposed RS-RCNN network improves not only the ability to represent global context information and local semantic information, but also the fusion efficiency and detection accuracy. Compared to other object detection networks, the proposed network has an AP value of 0.877, which is 7.4 percentage points higher than the original network. The successful application of this method provides a new solution for the robot to detect the non-spraying area.