LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Chalkidis, Ilias; Jana, Abhik; Hartung, Dirk; Bommarito, Michael James; Androutsopoulos, Ion; Katz, Daniel; Αλέτρας, Νικόλαος

doi:10.18653/v1/2022.acl-long.297

Cited by 43 publications

(59 citation statements)

References 81 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, we only report the results for HierBERT with LSTM + Attn on top. Our model with InLegalBERT obtains higher performance over all results reported in Chalkidis et al (2022) for ECtHR-B -the macro-F1 obtained by our model is 75.88% compared to the best value of 74.7% reported in Chalkidis et al (2022).…”

Section: Resultsmentioning

confidence: 48%

“…These models have already obtained better results than those originally reported for two important benchmark tasks over Indian legal text -the LSI task over the ILSI dataset (Paul et al, 2022), and the CJPE task over the ILDC dataset . The InLegalBERT model also improves across two benchmarks over non-Indian legal text -the LSI task over the ECtHR-B dataset (Chalkidis et al, 2022) and the Semantic Segmentation task over the UKSS dataset . Given that the number of NLP studies on legal text is increasing rapidly in recent years, we hope that these LMs will benefit other researchers working on Legal NLP.…”

Section: Discussionmentioning

confidence: 89%

“…(ii) ECtHR-B: This is the dataset used in the ECtHR Task B from the LexGlue benchmark suite (Chalkidis et al, 2022). The ECtHR-B dataset consists of 11K examples (facts from cases argued in the European Court of Human Rights), split into train, dev and test.…”

Section: Datasetsmentioning

confidence: 99%

See 2 more Smart Citations

Pre-training Transformers on Indian Legal Text

Paul¹,

Mandal²,

Goyal³

et al. 2022

Preprint

View full text Add to dashboard Cite

Natural Language Processing in the legal domain been benefited hugely by the emergence of Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text. There exist PLMs trained over European and US legal text, most notably Legal-BERT. However, with the rapidly increasing volume of NLP applications on Indian legal documents, and the distinguishing characteristics of Indian legal text, it has become necessary to pre-train LMs over Indian legal text as well. In this work, we introduce transformerbased PLMs pre-trained over a large corpus of Indian legal documents. We also apply these PLMs over several benchmark legal NLP tasks over both Indian legal text, as well as over legal text belonging to other domains (countries). The NLP tasks with which we experiment include Legal Statute Identification from facts, Semantic segmentation of court judgements, and Court Judgement Prediction. Our experiments demonstrate the utility of the Indiaspecific PLMs developed in this work.1 Although all these three models were named as Legal-BERT in the original research papers, we shall address them as LegalBERT (Chalkidis et al., 2020), CaseLawBERT (Zheng et al., 2021) and PoLBERT (Henderson et al., 2022) respectively, for sake of comprehension.

show abstract

Section: Resultsmentioning

confidence: 48%

Section: Discussionmentioning

confidence: 89%

See 1 more Smart Citation

Pre-training Transformers on Indian Legal Text

Paul¹,

Mandal²,

Goyal³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent approaches explore modeling moral and ethical judgement of real-life anecdotes from Reddit (Emelin et al, 2021;Sap et al, 2019a;Lourie et al, 2021;Botzer et al, 2022), with DELPHI (Jiang et al, 2021a) unifying the moral judgement prediction on these related benchmarks. Related is another line of work modeling legal judgement on judicial corpora (Chalkidis et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly

Fung¹,

Chakraborty²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, the rapid development of large-scale pre-trained language models (PLMs) based on transformers significantly benefits this area (Cui et al, 2022). Some of the PLMs including BERT (Devlin et al, 2018) are further pre-trained on legal corpora, such as Legal-BERT (Chalkidis et al, 2020), exhibiting the SOTA performance on legal text processing benchmarks (e.g., LexGLUE) (Zheng et al, 2021;Chalkidis et al, 2022a). However, in the meantime, some severe problems of models are also discovered, including unfairness and discrimination (Chalkidis et al, 2022b).…”

Section: Related Workmentioning

confidence: 99%

Knowledge is Power: Understanding Causality Makes Legal judgment Prediction Models More Generalizable and Robust

Chen¹,

Zhang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Legal judgment Prediction (LJP), aiming to predict a judgment based on fact descriptions, serves as legal assistance to mitigate the great work burden of limited legal practitioners. Most existing methods apply various large-scale pre-trained language models (PLMs) finetuned in LJP tasks to obtain consistent improvements. However, we discover the fact that the state-of-the-art (SOTA) model makes judgment predictions according to wrong (or non-casual) information, which not only weakens the model's generalization capability but also results in severe social problems like discrimination. Here, we analyze the causal mechanism misleading the LJP model to learn the spurious correlations, and then propose a framework to guide the model to learn the underlying causality knowledge in the legal texts. Specifically, we first perform open information extraction (OIE) to refine the text having a high proportion of causal information, according to which we generate a new set of data. Then, we design a model learning the weights of the refined data and the raw data for LJP model training. The extensive experimental results show that our model is more generalizable and robust than the baselines and achieves a new SOTA performance on two commonly used legal-specific datasets.

show abstract

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Cited by 43 publications

References 81 publications

Pre-training Transformers on Indian Legal Text

Pre-training Transformers on Indian Legal Text

NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly

Knowledge is Power: Understanding Causality Makes Legal judgment Prediction Models More Generalizable and Robust

Contact Info

Product

Resources

About