JuriBERT: A Masked-Language Model Adaptation for French Legal Text

Douka, Stella; Abdine, Hadi; Vazirgiannis, Michalis; Hamdani, Rajaa El; Amariles, David Restrepo

doi:10.48550/arxiv.2110.01485

Cited by 4 publications

(4 citation statements)

References 9 publications

(1 reference statement)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LegalDB, on the other hand, is a DistillBERT-based model that is pre-trained by English legalspecific training corpora. Lawformer [56] is a Longformerbased model that is pre-trained on large-scale Chinese legal long case documents, while JuriBERT [84] is a set of BERT models that uses LEGAL-BERT-SC as a pre-training model on French legal text datasets and adapts CamemBERT by additional pretraining on French legal text datasets.…”

Section: B Pre-trained Language Modelmentioning

confidence: 99%

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

Cui,

Shen,

Wen

2023

IEEE Access

View full text Add to dashboard Cite

Legal judgment prediction (LJP) applies Natural Language Processing (NLP) techniques to predict judgment results based on fact descriptions automatically. The present work addresses the growing interest in the application of NLP techniques to the task of LJP. Despite the current performance gap between machines and humans, promising results have been achieved in a variety of benchmark datasets, owing to recent advances in NLP research and the availability of large-scale public datasets. To provide a comprehensive survey of existing LJP tasks, datasets, models, and evaluations, this study presents the following contributions: (1) an analysis of 43 LJP datasets constructed in 9 different languages, together with a classification method of LJP based on three different attributes; (2) a summary of 16 evaluation metrics categorized into 4 different types to evaluate the performance of LJP models for different outputs;(3) a review of 8 legal-domain pretrained models in 4 languages, highlighting four major research directions for LJP; (4) state-of-the-art results for 11 representative datasets from different court cases and an in-depth discussion of the open challenges in this area. This study aims to provide a comprehensive review for NLP researchers and legal professionals to understand the advances in LJP over the past years, and to facilitate further joint efforts towards improving the performance of LJP models.

show abstract

Section: B Pre-trained Language Modelmentioning

confidence: 99%

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

Cui,

Shen,

Wen

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Lawformer [47] is a Longfomerbased model pre-trained on large-scale Chinese legal long case documents. JuriBERT [69] is a new set of BERT models, consisting of LEGAL-BERT-SC model pre-trained on French legal text datasets and adapting CamemBERT by additional pretraining on French legal text datasets. Besides the effort to the domain-specific pre-trained language models, an improvement of PLMs on legal tasks with a document length of longer than 512 has been another research pot in the legal domain.…”

Section: Pre-trained Language Modelmentioning

confidence: 99%

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

Cui¹,

Shen²,

Nie³

et al. 2022

Preprint

View full text Add to dashboard Cite

Legal judgment prediction (LJP) applies Natural Language Processing (NLP) techniques to predict judgment results based on fact descriptions automatically. Recently, large-scale public datasets and advances in NLP research have led to increasing interest in LJP. Despite a clear gap between machine and human performance, impressive results have been achieved in various benchmark datasets. In this paper, to address the current lack of comprehensive survey of existing LJP tasks, datasets, models and evaluations, (1) we analyze 31 LJP datasets in 6 languages, present their construction process and define a classification method of LJP with 3 different attributes;(2) we summarize 14 evaluation metrics under four categories for different outputs of LJP tasks; (3) we review 12 legal-domain pretrained models in 3 languages and highlight 3 major research directions for LJP; (4) we show the state-of-art results for 8 representative datasets from different court cases and discuss the open challenges. This paper can provide up-to-date and comprehensive reviews to help readers understand the status of LJP. We hope to facilitate both NLP researchers and legal professionals for further joint efforts in this problem.

show abstract

“…In addition, we introduce JuriBERT [Dou+21], a set of BERT models (tiny, mini, small and base) pre-trained from scratch on French legal-domain specific corpora. JuriB-ERT models are pretrained on 6.3GB of legal french raw text from two different sources: the first dataset is crawled from Légifrance and the other one consists of anonymized court's decisions and the Claimant's pleadings from the Court of Cassation.…”

Section: Large Scale Linguistic Resourcesmentioning

confidence: 99%

NLP Research and Resources at DaSciM, Ecole Polytechnique

Abdine¹,

Guo²,

Eddine³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

DaSciM (Data Science and Mining) part of LIX at Ecole Polytechnique, established in 2013 and since then producing research results in the area of large scale data analysis via methods of machine and deep learning. The group has been specifically active in the area of NLP and text mining with interesting results at methodological and resources level. Here follow our different contributions of interest to the AFIA community.

show abstract

JuriBERT: A Masked-Language Model Adaptation for French Legal Text

Cited by 4 publications

References 9 publications

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

NLP Research and Resources at DaSciM, Ecole Polytechnique

Contact Info

Product

Resources

About