“…Pretrain-finetune paradigm has been highly successful on tackling challenging problems in natural language processing, e.g., domain adaptation (Sato et al, 2020;Yao et al, 2020), incremental learning (Khayrallah et al, 2018;Wan et al, 2020), as well as knowledge transferring (Liu et al, 2020b). The rise of large-scale pre-trained language models further attracts increasing attention towards this strategy (Devlin et al, 2019;Edunov et al, 2019).…”