“…Recent years have witnessed the success in abstractive summarization using encoder-decoder framework with sequence-to-sequence models (Rush et al, 2015;Nallapati et al, 2016;See et al, 2017;Celikyilmaz et al, 2018). The encoder which is leveraged for syntactic compression can be implemented using recurrent neural networks (Chopra et al, 2016;Tan et al, 2017;Chen and Bansal, 2018), convolutional networks (Allamanis et al, 2016;Liu et al, 2018) and transformerbased methods (Devlin et al, 2019;Song et al, 2020b). To handle the problem that many OOV words are generated by vanilla sequence-to-sequence decoder, copy mechanism is proposed to copy a word from the source text or select an unseen word from the vocabulary (See et al, 2017;Zhou et al, 2018;.…”