“…For evaluating our models, we report standard metrics of BLEU4, METEOR and ROUGE-L. As baselines, we take two of the non-BERT state-of-the-art models (Du and Cardie, 2018;Zhang and Bansal, Model BLEU4 METEOR ROUGE-L CorefNQG (Du and Cardie, 2018) 15.16 19.12 -SemdriftQG (Zhang and Bansal, 2019) 18.37 22.65 6.68 Recurrent-BERT (Chan and Fan, 2019) 20.33 23.88 48.23 UniLM (Dong et al, 2019) 22 Du et al (2017). BERT refers to BERT-Large(cased) model (Devlin et al, 2019) 2019) and the two BERT-based QG models (Dong et al, 2019;Chan and Fan, 2019). We experimented with 4 settings: one without using any copy mechanism (No Copy), one using normal copy (Normal Copy; §3.3.1), one using self-copy (Self-Copy; §3.3.2) and finally with two-hop selfcopy (Two-Hop Self-Copy; §3.3.3).…”