2022
DOI: 10.48550/arxiv.2209.06049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pre-training Transformers on Indian Legal Text

Abstract: Natural Language Processing in the legal domain been benefited hugely by the emergence of Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text. There exist PLMs trained over European and US legal text, most notably Legal-BERT. However, with the rapidly increasing volume of NLP applications on Indian legal documents, and the distinguishing characteristics of Indian legal text, it has become necessary to pre-train LMs over Indian legal text as well. In this work, we introduce transforme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…We used the pre-trained transformers which are trained on general corpus like XLNet (Yang et al, 2019) and Roberta (Liu et al, 2019b) as well as trained on legal corpus LegalBERT (Chalkidis et al, 2020), InlegalBERT, IncaseLawBERT (Paul et al, 2022) and we also train the BERT large on Indian judgment cases. Since the transformers have restrictions that they can not accommodate more than 512 tokens, so we gave only the last 510 tokens (two special tokens are reserved for CLS and SEP) as (Malik et al, 2021) mentioned in the paper that in general most relevant information is present at the end of the documents.…”
Section: Legal Judgment Prediction (Ljp)mentioning
confidence: 99%
“…We used the pre-trained transformers which are trained on general corpus like XLNet (Yang et al, 2019) and Roberta (Liu et al, 2019b) as well as trained on legal corpus LegalBERT (Chalkidis et al, 2020), InlegalBERT, IncaseLawBERT (Paul et al, 2022) and we also train the BERT large on Indian judgment cases. Since the transformers have restrictions that they can not accommodate more than 512 tokens, so we gave only the last 510 tokens (two special tokens are reserved for CLS and SEP) as (Malik et al, 2021) mentioned in the paper that in general most relevant information is present at the end of the documents.…”
Section: Legal Judgment Prediction (Ljp)mentioning
confidence: 99%