Image description technology is an important research direction in the field of deep learning. It is a task that uses computer vision techniques and natural language processing techniques to generate textual descriptions of the image features extracted from the corresponding images into high-level semantic information, i.e. to enable computers to learn the ability to "read pictures and talk". This paper collates several representative research methods that have emerged successively in the continuous development of image description. The popular template-and retrieval-based image description methods at the beginning of the research, and later, as deep learning flourishes, deep learning-based image description techniques have become mainstream, starting from end-to-end encoder-decoder, subsequently, the model began to be refined using the attention mechanism, and nowadays, new techniques based on Transformer technology and generative adversarial networks have greatly improved the accuracy of description.