BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval

Shao, Yunqiu; Mao, Jiaxin; Liu, Yiqun; Ma, Weigang; Satoh, Ken; Zhang, Min; Ma, Shaoping

doi:10.24963/ijcai.2020/484

Cited by 101 publications

(101 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The same line of research has been explored in the Competition On Legal Information Extraction/Entailment (COLIEE), during which several tasks related to the legal domain have been solved with the support of embedding techniques. Among the approaches related to the present paper, it is worth mentioning BERT-PLI (Shao et al 2020), that adopts BERT to capture the semantic relationships at the paragraph-level and then infers the relevance between two cases by aggregating paragraph-level interactions. Analogously to LEGAL-BERT, the BERT model in BERT-PLI is fine-tuned with a dataset related to the legal field.…”

Section: Retrieval Of Legal Informationmentioning

confidence: 99%

“…• LEGAL-BERT-BASE, that is the LEGAL-BERT model 3 fine-tuned by Chalkidis et al (2020) using a wide set of legal documents related to EU, UK and US law; • LEGAL-BERT-SMALL, that is the LEGAL-BERT model 3 fine-tuned by Chalkidis et al (2020) using the same set of documents adopted for LEGAL-BERT-BASE, but in a lower-dimensional embedding space; • LEGAL-BERT-EURLEX, that is the LEGAL-BERT model 3 fine-tuned by Chalkidis et al (2020) using the EUR-LEX dataset; • BERT-PLI, that is the system BERT-PLI 4 based on BERT, fine-tuned with a small set of legal documents, proposed by Shao et al (2020) in the Competition On Legal Information Extraction/Entailment (COLIEE).…”

Section: Experimental Settingmentioning

confidence: 99%

See 1 more Smart Citation

PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments

2021

View full text Add to dashboard Cite

In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sector. Following this line of research, in this paper we propose a novel method, called PRILJ, that identifies paragraph regularities in legal case judgments, to support legal experts during the redaction of legal documents. Methodologically, PRILJ adopts a two-step approach that first groups documents into clusters, according to their semantic content, and then identifies regularities in the paragraphs for each cluster. Embedding-based methods are adopted to properly represent documents and paragraphs into a semantic numerical feature space, and an Approximated Nearest Neighbor Search method is adopted to efficiently retrieve the most similar paragraphs with respect to the paragraphs of a document under preparation. Our extensive experimental evaluation, performed on a real-world dataset provided by EUR-Lex, proves the effectiveness and the efficiency of the proposed method. In particular, its ability of modeling different topics of legal documents, as well as of capturing the semantics of the textual content, appear very beneficial for the considered task, and make PRILJ very robust to the possible presence of noise in the data.

show abstract

Section: Retrieval Of Legal Informationmentioning

confidence: 99%

Section: Experimental Settingmentioning

confidence: 99%

PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments

2021

View full text Add to dashboard Cite

show abstract

“…While this trend is in its early stages, its maturation could help to deal with some of the above-mentioned limitations. However, it was observed by many authors (Alberts et al 2020, Bhattacharya et al 2020, Chalkidis et al, 2020, Draijer 2019, Raghav et al 2016, Shao et al 2020, Van Opijnen & Santos 2017, Xiao et al 2019, Wang et al 2019, Zhong et al 2020) that current neural systems for natural language understanding that perform very well in non-legal domains do not transfer easily to tasks in the legal domain, for a variety of reasons that make this domain especially challenging (see Table 1).…”

Section: Searching For Legal Documents At Paragraph Level: Automating Label Generation and Use Of An Extended Attention Mask For Boostingmentioning

confidence: 99%

“…For a user it can be helpful to first find the most semantically similar paragraphs with a few relevant legal concepts in a lengthy and complex case document, before reading the whole document. This way of defining the task also has another practical advantage, in the sense that many state-of-the-art (neural) models would struggle to encode the semantics of a very long text with hundreds of sentences in a useful way, while text at the level of one or a few sentences will be a more realistic input text to such models (Alberts et al 2020, Shao et al 2020). In the best case, such semantic similarity models at paragraph level should be invariant to differences in the input that do not matter for approximating relevance in this task, while they will be more selective to differences that do matter (Neculoiu et al 2016).…”

Section: Searching For Legal Documents At Paragraph Level: Automating Label Generation and Use Of An Extended Attention Mask For Boostingmentioning

confidence: 99%

“…The above problem can therefore be formulated as a specialized Information Retrieval task that involves searching through a corpus of legal documents, before ranking them according to their relevance. While the concept of relevance in this context is conceptually and practically complex, we will focus our attention on a key aspect, the semantic similarity between a query and a candidate hit (Shao et al 2020). Early attempts at developing such systems often involved the use of Boolean expressions as queries, where a document…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity

Tang¹,

Clematide²

2021

Proceedings of the Natural Legal Language Processing Workshop 2021

View full text Add to dashboard Cite

Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don't match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a linkbased similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.

show abstract