Motivation: Protein design is the well-known inverse protein-folding problem. Current protein design has low success rate to design single sequence, leading to studies on predicting sequence profile. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. To overcome the sparse data of protein structures in 3D space, we converted 3D structures to 2D maps of pairwise residue distances. By integrating both 1D and 2D structural features, we developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. Results: Finally, we obtained the best performed model, SPROF that combined recurrent neural network, convolution neural network and attention mechanism. The method achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. In addition, the generated sequence profile will be helpful for improving existing protein design and fold recognition techniques. Availability: https://github.com/sysu-yanglab/SPROF Contact: