In medical image segmentation tasks, it is typical to adopt convolutional neural networks with a serial encoder-decoder structure. However, mainstream networks cannot simultaneously achieve sufficient extraction of global features and the fusion of multi-scale information, which may lead to unpromising results for the segmentation of pathological images. Therefore, this article proposed a novel multi-scale feature fusion and global self-attention network (MSSA-Net) for medical image segmentation. Specifically, we designed a parallel double-encoder network with a multi-scale feature fusion encoder (MS-Encoder) and a self-attention encoder (SA-Encoder). The SA-Encoder introduces the transformer's global self-attention mechanism to extract global features, and the MS-Encoder adopts atrous spatial pyramid pooling (ASPP) to realize multi-scale fusion. We have evaluated the proposed MSSA-Net using three medical segmentation datasets, covering various imaging modalities such as colonoscopy and magnetic resonance imaging. Experiments on the CVC-ClinicDC, the 2015 MICCAI subchallenge on automatic polyp detection dataset, and anatomical tracings of lesions after stroke (ATLAS) show that our MSSA-Net outperforms mainstream methods such as DoubleU-Net and TransUNet. Moreover, MSSA-Net can predict more accurate segmentation masks, especially in the case of ATLAS, which has challenging images such as multiple shadow areas and discrete lesions.