Anais Do XIII Simpósio Brasileiro De Tecnologia Da Informação E Da Linguagem Humana (STIL 2021) 2021
DOI: 10.5753/stil.2021.17803
|View full text |Cite
|
Sign up to set email alerts
|

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

Abstract: In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the Kollemata Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of 〈F1〉m… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 16 publications
0
0
0
Order By: Relevance
“…4 shows that the most frequent technique is BERT, used by 63 studies (34%), which consists of a language model based on Transformers that stood out for its performance in NLP tasks, mainly for pre-training representations of unlabeled texts [Devlin et al 2018]. In particular, the high usage of the pre-trained BERT model indicates the recognition of its effectiveness in capturing semantic relationships and its applicability in several tasks such as natural language inference [Nanclarez et al 2022], text classification [Serras andFinger 2021, Ferraz et al 2021], sentiment analysis [Britto et al 2022] and, entities extraction through the tokens classification in [Lochter et al 2020]. Besides, some BERT variants are also present in our analysis, worth mentioning BERTimbau, discussed in 32 papers (17%); Multilingual BERT (M-BERT) used in 15 studies (8%); and BERTopic mentioned in 6 works (3%).…”
Section: Tools and Techniques In Nlp For Social Media Analysismentioning
confidence: 99%
“…4 shows that the most frequent technique is BERT, used by 63 studies (34%), which consists of a language model based on Transformers that stood out for its performance in NLP tasks, mainly for pre-training representations of unlabeled texts [Devlin et al 2018]. In particular, the high usage of the pre-trained BERT model indicates the recognition of its effectiveness in capturing semantic relationships and its applicability in several tasks such as natural language inference [Nanclarez et al 2022], text classification [Serras andFinger 2021, Ferraz et al 2021], sentiment analysis [Britto et al 2022] and, entities extraction through the tokens classification in [Lochter et al 2020]. Besides, some BERT variants are also present in our analysis, worth mentioning BERTimbau, discussed in 32 papers (17%); Multilingual BERT (M-BERT) used in 15 studies (8%); and BERTopic mentioned in 6 works (3%).…”
Section: Tools and Techniques In Nlp For Social Media Analysismentioning
confidence: 99%