To establish a detection network appropriate for buildings in remote sensing images and lessen the issues including poor detection effects, missing detection and false detection due to the deficiency of detailed features, this paper conducted the design on the basis of Segformer network to solve the problem, coupled the transposed convolutional networks at the decoder stage, and addressed the issue of missing feature semantics via adding holes and fillings. Multiple normalization layers and activation layers were cascaded after the convolution layer to avert overfitting regularization expression and guarantee the classification of stable feature parameters, so as to further advance inter-class differentiated extraction. Ablation experiments and comparison experiments were conducted on AISD, MBD and WHU remote sensing image datasets: The robustness and effectiveness of the improved mechanism were demonstrated by control groups of ablation experiments; in comparison experiments with Hrnet, PSPNet, UNet, Deeplabv3+ and the original detection algorithm, the mIoU of AISD, MBD and WHU was improved by up to 12.83%, 28.82% and 14.26%, respectively. The experimental results indicated that the improved method was better than the comparative methods such as UNet, and had better effects on integrity detection of building edge as well as the reduction of missing detection and false detection.