Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law 2021
DOI: 10.1145/3462757.3466104
|View full text |Cite
|
Sign up to set email alerts
|

Legal norm retrieval with variations of the bert model combined with TF-IDF vectorization

Abstract: In this work, we examine variations of the BERT model on the statute law retrieval task of the COLIEE competition. This includes approaches to leverage BERT's contextual word embeddings, finetuning the model, combining it with TF-IDF vectorization, adding external knowledge to the statutes and data augmentation. Our ensemble of Sentence-BERT with two different TF-IDF representations and document enrichment exhibits the best performance on this task regarding the F2 score. This is followed by a fine-tuned LEGAL… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Interestingly, from this result, pretrained models tend to achieve higher performance than the non-pretrained model (i.e., Attentive CNN). Table 4.7 presents the final performance on the test set after ensembling with the lexical score by the optimal value of α. Paraformer outperforms other models and achieves state-of-the-art results in Precision (0.7901) and Macro-F2 (0.7407) and surpasses the current state-of-the-art system by Wehnert et al [63]. The best recall belongs to the systems of Nguyen et al [45] and Wehnert et al [63].…”
Section: Methodsmentioning
confidence: 84%
“…Interestingly, from this result, pretrained models tend to achieve higher performance than the non-pretrained model (i.e., Attentive CNN). Table 4.7 presents the final performance on the test set after ensembling with the lexical score by the optimal value of α. Paraformer outperforms other models and achieves state-of-the-art results in Precision (0.7901) and Macro-F2 (0.7407) and surpasses the current state-of-the-art system by Wehnert et al [63]. The best recall belongs to the systems of Nguyen et al [45] and Wehnert et al [63].…”
Section: Methodsmentioning
confidence: 84%
“…Several approaches are available to transform the textual data into a numeric format. For our case, we have nominated the count vectorizer (CV) [18] and Tf-Idf vectorizer [19] for the said purpose due to their high effectiveness in the area of NLP.…”
Section: Data Preparationmentioning
confidence: 99%
“…Nowadays, many BERT language models take advantage of their underlying transformer approach to produce a specific BERT model fine-tuned for NER tasks in different languages (Souza et al, 2019;Labusch et al, 2019;Jia et al, 2020;Taher et al, 2020). There is also research done for BERT in the legal domain that uses BERT for various legal tasks such as topic modeling (Silveira et al, 2021), legal norm retrieval (Wehnert et al, 2021), and legal case retrieval (Shao et al, 2020).…”
Section: Introductionmentioning
confidence: 99%