“…To solve the dataset bias issue, a straightforward solution is to perform data-level manipulations to prevent models from capturing the unintended dataset biases in model training, including data balance (Dixon et al, 2018;Geng et al, 2007;Chen et al, 2017;Sun et al, 2018;Rayhan et al, 2017;Nguyen et al, 2011) (a.k.a. resampling) and data augmentation (Wei and Zou, 2019;Qian et al, 2020b). Another common paradigm for text classification is typically to design model-level balancing mechanisms, including unbiased embedding (Bolukbasi et al, 2016;Kaneko and Bollegala, 2019), threshold correction (Kang et al, 2020;Provost, 2000;Calders and Verwer, 2010) and instance weighting Zhao et al, 2017;Jiang and Zhai, 2007).…”