Proceedings of the Natural Legal Language Processing Workshop 2021 2021
DOI: 10.18653/v1/2021.nllp-1.9
|View full text |Cite
|
Sign up to set email alerts
|

JuriBERT: A Masked-Language Model Adaptation for French Legal Text

Abstract: Language models have proven to be very useful when adapted to specific domains. Nonetheless, little research has been done on the adaptation of domain-specific BERT models in the French language. In this paper, we focus on creating a language model adapted to French legal text with the goal of helping law professionals. We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data. We explore the use of smaller architectures in domain-specific sub-languag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 3 publications
0
0
0
Order By: Relevance
“…Our dataset is on another type of litigation (habitual residency of children) and we focus on the manual construction of our dataset, instead of automatically constructing it. Finally, Douka et al (2021) introduced JuriBERT, trained on Légifrance, an official web-site publishing all French law and evaluated it on topic classification tasks for documents from the Cour de Cassation (highest court in France).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Our dataset is on another type of litigation (habitual residency of children) and we focus on the manual construction of our dataset, instead of automatically constructing it. Finally, Douka et al (2021) introduced JuriBERT, trained on Légifrance, an official web-site publishing all French law and evaluated it on topic classification tasks for documents from the Cour de Cassation (highest court in France).…”
Section: Related Workmentioning
confidence: 99%
“…Language Models As regards pretrained language models, we used FlauBERT and CamemBERT (Martin et al, 2020), two generic-purpose pretrained models for French, as well as JuriBERT (Douka et al, 2021), a language model trained only on data from the legal domain.…”
Section: Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the legal domain, text classification has an established tradition, both in the monolingual (Šarić et al, 2014;Papaloukas et al, 2021) and in the multi-lingual setting (Steinberger et al, 2006(Steinberger et al, , 2012Chalkidis et al, 2019;Avram et al, 2021;Chalkidis et al, 2021). Moreover, the large availability of legal data, produced by national and supranational public institutions, set the stage for the development of domain-adapted models (Chalkidis et al, 2020;Douka et al, 2021;Masala et al, 2021;Licari and Comandè, 2022). As for Italian, a multi-label classification system for bills has been proposed by De Angelis et al (2022), based on Bi-GRU architecture using static word embeddings and employing a dataset of 28k legal document tagged with the TESEO thesaurus.…”
Section: Related Workmentioning
confidence: 99%
“…Pretrained language models (PLMs; Devlin et al 2019;Liu et al 2019;Raffel et al 2020) have seen broad adaptation across various domains such as biology , healthcare (Alsentzer et al, 2019), law (Chalkidis et al, 2020;Douka et al, 2021), software engineering (Tabassum et al, 2020), and social media (Röttger and Pierrehumbert, 2021;Guo et al, 2021). These models benefit from in-domain corpora (e.g., PubMed for the biomedical domain) to learn domain-specific terms and concepts.…”
Section: Introductionmentioning
confidence: 99%