2021
DOI: 10.48550/arxiv.2110.01485
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

JuriBERT: A Masked-Language Model Adaptation for French Legal Text

Abstract: Language models have proven to be very useful when adapted to specific domains. Nonetheless, little research has been done on the adaptation of domain-specific BERT models in the French language. In this paper, we focus on creating a language model adapted to French legal text with the goal of helping law professionals. We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data. We explore the use of smaller architectures in domain-specific sub-languag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 9 publications
(1 reference statement)
0
3
0
Order By: Relevance
“…LegalDB, on the other hand, is a DistillBERT-based model that is pre-trained by English legalspecific training corpora. Lawformer [56] is a Longformerbased model that is pre-trained on large-scale Chinese legal long case documents, while JuriBERT [84] is a set of BERT models that uses LEGAL-BERT-SC as a pre-training model on French legal text datasets and adapts CamemBERT by additional pretraining on French legal text datasets.…”
Section: B Pre-trained Language Modelmentioning
confidence: 99%
“…LegalDB, on the other hand, is a DistillBERT-based model that is pre-trained by English legalspecific training corpora. Lawformer [56] is a Longformerbased model that is pre-trained on large-scale Chinese legal long case documents, while JuriBERT [84] is a set of BERT models that uses LEGAL-BERT-SC as a pre-training model on French legal text datasets and adapts CamemBERT by additional pretraining on French legal text datasets.…”
Section: B Pre-trained Language Modelmentioning
confidence: 99%
“…Lawformer [47] is a Longfomerbased model pre-trained on large-scale Chinese legal long case documents. JuriBERT [69] is a new set of BERT models, consisting of LEGAL-BERT-SC model pre-trained on French legal text datasets and adapting CamemBERT by additional pretraining on French legal text datasets. Besides the effort to the domain-specific pre-trained language models, an improvement of PLMs on legal tasks with a document length of longer than 512 has been another research pot in the legal domain.…”
Section: Pre-trained Language Modelmentioning
confidence: 99%
“…In addition, we introduce JuriBERT [Dou+21], a set of BERT models (tiny, mini, small and base) pre-trained from scratch on French legal-domain specific corpora. JuriB-ERT models are pretrained on 6.3GB of legal french raw text from two different sources: the first dataset is crawled from Légifrance and the other one consists of anonymized court's decisions and the Claimant's pleadings from the Court of Cassation.…”
Section: Large Scale Linguistic Resourcesmentioning
confidence: 99%