Deep neural networks often need to be trained with a large number of samples in a dataset. When the training samples in a dataset are not enough, the performance of the model will degrade. The Generative Adversarial Network (GAN) is considered to be effective at generating samples, and thus, at expanding the datasets. Consequently, in this paper, we proposed a novel method, called the Stacked Siamese Generative Adversarial Network (SSGAN), for generating large-scale images with high quality. The SSGAN is made of a Color Mean Segmentation Encoder (CMS-Encoder) and several Siamese Generative Adversarial Networks (SGAN). The CMS-Encoder extracts features from images using a clustering-based method. Therefore, the CMS-Encoder does not need to be trained and its output has a high interpretability of human visuals. The proposed Siamese Generative Adversarial Network (SGAN) controls the category of generated samples while guaranteeing diversity by introducing a supervisor to the WGAN. The SSGAN progressively learns features in the feature pyramid. We compare the Fréchet Inception Distance (FID) of generated samples of the SSGAN with previous works on four datasets. The result shows that our method outperforms the previous works. In addition, we trained the SSGAN on the CelebA dataset, which consists of cropped images with a size of 128 × 128. The good visual effect further proves the outstanding performance of our method in generating large-scale images.