“…Fairness research in NLP has seen tremendous growth in the past few years (e.g., (Bolukbasi et al, 2016;Caliskan et al, 2017;Webster et al, 2018;Díaz et al, 2018;Dixon et al, 2018;De-Arteaga et al, 2019;Gonen and Goldberg, 2019;Manzini et al, 2019)) over a range of NLP tasks such as co-reference resolution and machine translation, as well as the tasks we studied -sentiment analysis and toxicity prediction. Some of this work study bias by swapping names in sentence templates (Caliskan et al, 2017;Kiritchenko and Mohammad, 2018;May et al, 2019;Gonen and Goldberg, 2019); however they use synthetic sentence templates, while we extract naturally occurring sentences from the target corpus.…”