Automatic code summarization generates high-level natural language descriptions of code snippets, which can benefit software maintenance and code comprehension. Recently, Transformer-based models achieved state-of-the-art performance on code summarization tasks. However, there are data gaps in neural model training for some programming languages. To fill this gap, we propose a novel transfer learning approach to accurately transfer knowledge between Transformer-based models. We train a discriminator to identify which heads of the multi-head attention module should be transferred. On this basis, we define a transfer strategy of parameter matrices. We evaluated the proposed transfer learning approach on four state-of-the-art Transformer-based code summarization models. Experimental results show that models with transferred knowledge outperform original models up to 10.70% in BLEU, 5.36% in ROUGE-L, and 4.34% in METEOR.