Osteosarcoma is one of the most common primary malignancies of bone in the pediatric and adolescent populations. The morphology and size of osteosarcoma MRI images often show great variability and randomness with different patients. In developing countries, with large populations and lack of medical resources, it is difficult to effectively address the difficulties of early diagnosis of osteosarcoma with limited physician manpower alone. In addition, with the proposal of precision medicine, existing MRI image segmentation models for osteosarcoma face the challenges of insufficient segmentation accuracy and high resource consumption. Inspired by transformer’s self-attention mechanism, this paper proposes a lightweight osteosarcoma image segmentation architecture, UATransNet, by adding a multilevel guided self-aware attention module (MGAM) to the encoder-decoder architecture of U-Net. We successively perform dataset classification optimization and remove MRI image irrelevant background. Then, UATransNet is designed with transformer self-attention component (TSAC) and global context aggregation component (GCAC) at the bottom of the encoder-decoder architecture to perform integration of local features and global dependencies and aggregation of contexts to learned features. In addition, we apply dense residual learning to the convolution module and combined with multiscale jump connections, to improve the feature extraction capability. In this paper, we experimentally evaluate more than 80,000 osteosarcoma MRI images and show that our UATransNet yields more accurate segmentation performance. The IOU and DSC values of osteosarcoma are 0.922 ± 0.03 and 0.921 ± 0.04, respectively, and provide intuitive and accurate efficient decision information support for physicians.