The super-resolution algorithms of single remote sensing image (SRSI) based on generative adversarial network have made a breakthrough recently, which can effectively learn local details to generate more realistic RSIs with high resolution. However, most of them ignore the characteristics of large size and small targets, resulting in the loss of details at the edges of the generated images, resulting in a strong sense of blur. To solve these problems, this paper proposes an improved architecture named DAE 2 GAN based on attention mechanism and transformer. First, to process large-size RSIs, the vision transformer is selected as a discriminator to compensate the lack of global information attention of convolutional generator. At the same time, to make the generator consider small objects of RSIs better, channel attention is introduced to realize the focus on the local contour with high frequency. Then, an edge loss is designed to constrain the training process, so that the edge details of the generated image can be kept more complete. Experiments show that the proposed method can more effectively improve the reconstruction quality in visual effect of SRSI, presenting clearer and richer detailed information, and the PSNR and Structural Similarity of reconstructed images generated by the DAE 2 GAN are improved well, where the highest increase is 1.68/0.078 compared with the existing mainstream methods. Therefore, the proposed DAE 2 GAN can efficiently assisting the implementation of various remote sensing tasks, such as urban road identification, agricultural monitoring, and geological exploration.