The panoramic dental X-ray images are an essential diagnostic tool used by dentists to detect the symptoms in an early stage and develop appropriate treatment plans. In recent years, deep learning methods have been applied to achieve tooth segmentation of dental X-rays, which aims to assist dentists in making clinical decisions. Because the original images contain plenty of useless information, it is necessary to extract the region-of-interest (ROI) to obtain more accurate results by focusing on the maxillofacial region. However, a fast and accurate maxillofacial segmentation without hand-crafted features is challenging due to the poor image quality. In this study, we create a large maxillofacial dataset and propose an efficient encoder-decoder network model named EED-Net to solve this problem. This dataset consists of 2602 panoramic dental X-ray images and corresponding segmentation masks annotated by the trained experts. Based on the original structure of U-Net, our model structure contains three major modules: a feature encoder, a corresponding decoder, and a multipath feature extractor that connects the encoding path and the decoding path. In order to obtain more semantic features from the depth and breadth, we replace the convolution layer with the residual block in the encoder and adopt Inception-ResNet block in the multipath feature extractor. Inspired by the skip connection in FCN-8s, the lightweight decoder has the same channel dimension as the number of segmented objects. Besides, a weighted loss function is used to enhance segmentation accuracy. The comprehensive experimental results on the new dataset demonstrate that our model achieves better accuracy and speed trade-offs for maxillofacial segmentation than the latest methods.