The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

Savelka, Jaromir; Ashley, Kevin D.

doi:10.3389/frai.2023.1279794

Cited by 9 publications

(3 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sarkar et al [56] evaluated multiple techniques, including LLMs (BERT), in zero/few-shot classification of legal texts. GPT models have already been applied to analyse legal cases—for example, to: annotate sentences’ roles in Board of Veterans’ Appeals (BVA) cases, such as finding, evidence, legal rule, citation or reasoning [57]; predict Supreme Court Justice decisions [58]; determine how well a case passage explains a statutory term [59]; or generate interpretations of a term based on such passages [60,61]. Other studies by Blair-Stanek et al , Nguyen et al and Janatian et al were focused on the capabilities of the GPT models to conduct legal reasoning [62–64], to model US Supreme Court cases [58], to give legal information to laypeople [65], and to support online dispute resolution [66].…”

Section: Related Workmentioning

confidence: 99%

Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factors

Gray,

Savelka,

Oliver

et al. 2024

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

This paper investigates the potential for reducing the complexity of AI and Law and empirical legal studies projects through a novel annotation methodology that relies on GPT Family Models to assist human annotators. Improving the speed, cost and quality of annotation could greatly benefit such projects. In modelling types of legal claims, researchers in the fields of empirical legal studies and AI and Law have long relied on manually annotating factors in case texts. To demonstrate our methodology, we employ cases and factors regarding whether a police officer has constitutional authority to detain a motorist on the basis of the officer’s suspicion that the motorist is trafficking drugs. Our results demonstrate how recent advances in text analytics can reduce the burden of identifying factors in large numbers of cases and improve machine learning models’ predictions of case outcomes. This article is part of the theme issue ‘A complexity science approach to law and governance’.

show abstract

Section: Related Workmentioning

confidence: 99%

Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factors

Gray,

Savelka,

Oliver

et al. 2024

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

show abstract

“…With LLMs, embeddings are numerical representations of words, phrases, or sentences that capture contextual information and understand relationships within large segments of text. They have been employed in various tasks, such as text retrieval and ranking (e.g., Qadrud-Din et al (2020)), text classification (e.g.,Chae and Davidson (2023)), and sentiment analysis (e.g., Savelka and Ashley (2023)). In this project, our focus lies in embeddings extracted from LLMs.…”

Section: Related Workmentioning

confidence: 99%

Predictive Modeling and Deep Phenotyping of Obstructive Sleep Apnea and Associated Comorbidities through Natural Language Processing and Large Language Models

Ahmed,

Rispoli,

Wasieloski

et al. 2024

Preprint

View full text Add to dashboard Cite

Obstructive Sleep Apnea (OSA) is a prevalent sleep disorder associated with serious health conditions. This project utilized large language models (LLMs) to develop lexicons for OSA sub-phenotypes. Our study found that LLMs can identify informative lexicons for OSA subphenotyping in simple patient cohorts, achieving wAUC scores of 0.9 or slightly higher. Among the six models studied, BioClinical BERT and BlueBERT outperformed the rest. Additionally, the developed lexicons exhibited some utility in predicting mortality risk (wAUC score of 0.86) and hospital readmission (wAUC score of 0.72). This work demonstrates the potential benefits of incorporating LLMs into healthcare.

show abstract

“…To develop the prompt, we follow [19], and provide the model with almost an exact copy of the annotation guidelines provided to annotators in [1] (cf. [20] where only excerpts are used). We call this "guideline-prompting".…”

Section: Prompt Developmentmentioning

confidence: 99%

Can GPT Alleviate the Burden of Annotation?

Gray,

Savelka,

Oliver

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

Manual annotation is just as burdensome as it is necessary for some legal text analytic tasks. Given the promising performance of Generative Pretrained Transformers (GPT) on a number of different tasks in the legal domain, it is natural to ask if it can help with text annotation. Here we report a series of experiments using GPT-4 and GPT 3.5 as a pre-annotation tool to determine whether a sentence in a legal opinion describes a legal factor. These GPT models assign labels that human annotators subsequently confirm or reject. To assess the utility of pre-annotating sentences at scale, we examine the agreement among gold-standard annotations, GPT’s pre-annotations, and law students’ annotations. The agreements among these groups support that using GPT-4 as a pre-annotation tool is a useful starting point for large-scale annotation of factors.

show abstract

The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

Cited by 9 publications

References 38 publications

Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factors

Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factors

Predictive Modeling and Deep Phenotyping of Obstructive Sleep Apnea and Associated Comorbidities through Natural Language Processing and Large Language Models

Can GPT Alleviate the Burden of Annotation?

Contact Info

Product

Resources

About