“…According to results from previous research, sequence level measures are superior to aggregating token-level information for sequence-labeling with CRF models (Settles and Craven, 2008;Chen et al, 2015b;Shen et al, 2017;Liu et al, 2020). We incorporate the following most representative query methods that are explored in prior work for NER tasks (Settles and Craven, 2008;Chen et al, 2015b;Shen et al, 2017;Chen et al, 2017;Siddhant and Lipton, 2018;Shelmanov et al, 2019;Chaudhary et al, 2019;Grießhaber et al, 2020;Shui et al, 2020;Ren et al, 2021;Liu et al, 2020Liu et al, , 2022Agrawal et al, 2021), in our experiments:…”