Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.301
|View full text |Cite
|
Sign up to set email alerts
|

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

Abstract: We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Switzerland, and China), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, region, language, and legal area). In our experiments, we evaluate pretrained language models using several grouprobust fine-tuning techniques an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(20 citation statements)
references
References 19 publications
(25 reference statements)
1
8
0
Order By: Relevance
“…A similar effect is also observed by Baldini et al (2022). Chalkidis et al (2022) examine the effectiveness of debiasing methods over a multi-lingual benchmark dataset consisting of four subsets of legal documents, covering five languages and various sensitive attributes. They find that methods aim-ing to improve worse-case performance tend to fail in more realistic settings, where both target label and protected attribute distributions vary over time.…”
Section: Effectiveness Of Debiasing Methodssupporting
confidence: 62%
See 1 more Smart Citation
“…A similar effect is also observed by Baldini et al (2022). Chalkidis et al (2022) examine the effectiveness of debiasing methods over a multi-lingual benchmark dataset consisting of four subsets of legal documents, covering five languages and various sensitive attributes. They find that methods aim-ing to improve worse-case performance tend to fail in more realistic settings, where both target label and protected attribute distributions vary over time.…”
Section: Effectiveness Of Debiasing Methodssupporting
confidence: 62%
“…Beyond the standard definitions of fairness, a number of studies have examined the effectiveness of various debiasing methods in additional settings (Gonen and Goldberg, 2019;Meade et al, 2021;Lamba et al, 2021;Baldini et al, 2022;Chalkidis et al, 2022). For example, Meade et al (2021) not only examine the effectiveness of various debiasing methods but also measure the impact of debiasing methods on a model's language modeling ability and downstream task performance.…”
Section: Effectiveness Of Debiasing Methodsmentioning
confidence: 99%
“…AI/NLP researchers have uniformly adopted a Rawlsian notion of fairness. This is reflected in the by now common practice of citing Rawls when mentioning fairness [3][4][5][6][7]. Fairness plays a central role in the philosophy of John Rawls.…”
Section: Ai/nlp Fairness Is Rawlsianmentioning
confidence: 99%
“…6 Welfare, like 'benefit', is performance as measured by the go-to performance metric. 7 Rawlsian fairness thus becomes maximizing the performance on data sampled from the group on which performance is currently lowest. Many algorithms have therefore been developed to maximize performance on the groups with the worst performance.…”
Section: Ai/nlp Fairness Is Rawlsianmentioning
confidence: 99%
See 1 more Smart Citation