Precedents constitute the starting point of judges' reasoning in national legal systems. Precedents are also an essential input for case-based reasoning (CBR) methodologies. Although considerable research has been done on CBR applied to legal practice, the precedent retrieval techniques are a relatively new and unexplored field of AI & Law. Only a few works have tested or developed methods for identifying such previous similar cases. This work uses text mining (TM), natural language processing (NLP), and data visualization methods to provide a semi-automated rapid literature review and identify how justice courts and legal practitioners may use AI to retrieve similar cases. Based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), automation techniques were used to expedite the literature review. In this study, we confirmed the feasibility of automation tools for expediting literature reviews and provided an overview of the current research state on legal precedents retrieval.
Judges frequently rely their reasoning on precedents. In every circumstance, courts must preserve uniformity in case law and, depending on the legal system, previous cases compel rulings. The search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologies. Innovations in language processing and machine learning (ML) brought momentum to identifying precedents while providing tools for automating this task. This rapid literature review investigated how research on the identification of legal precedents has evolved. It also examined the most promising automation strategies for this task and confirmed the growing interest in using artificial intelligence for legal precedents retrieval. The findings demonstrate that no artificial intelligence solution currently stands out as the most effective at finding past similar cases. Also, existing results require validation with statistically significant samples and ground truth provided by specialists. In addition, this work employed text mining (TM) to automate part of the literature review while still delivering an accurate picture of research in the field. Ultimately, this review suggests directions for future work, as more experimentation is required.
Decisions of regulatory government bodies and courts affect many aspects of citizens’ lives. These organizations and courts are expected to provide timely and coherent decisions, although they struggle to keep up with the increasing demand. The ability of machine learning (ML) models to predict such decisions based on past cases under similar circumstances was assessed in some recent works. The dominant conclusion is that the prediction goal is achievable with high accuracy. Nevertheless, most of those works do not consider important aspects for ML models that can impact performance and affect real-world usefulness, such as consistency, out-of-sample applicability, generality, and explainability preservation. To our knowledge, none considered all those aspects, and no previous study addressed the joint use of metadata and text-extracted variables to predict administrative decisions. We propose a predictive model that addresses the abovementioned concerns based on a two-stage cascade classifier. The model employs a first-stage prediction based on textual features extracted from the original documents and a second-stage classifier that includes proceedings’ metadata. The study was conducted using time-based cross-validation, built on data available before the predicted judgment. It provides predictions as soon as the decision date is scheduled and only considers the first document in each proceeding, along with the metadata recorded when the infringement is first registered. Finally, the proposed model provides local explainability by preserving visibility on the textual features and employing the SHapley Additive exPlanations (SHAP). Our findings suggest that this cascade approach surpasses the standalone stages and achieves relatively high Precision and Recall when both text and metadata are available while preserving real-world usefulness. With a weighted F1 score of 0.900, the results outperform the text-only baseline by 1.24% and the metadata-only baseline by 5.63%, with better discriminative properties evaluated by the receiver operating characteristic and precision-recall curves.
Precedent is the cornerstone of the Common law system. Even in jurisdictions that follow Civil law, precedents constrain decisions when case law is sufficiently uniform. A systematic disregard of precedents makes judgments less coherent and the law less just. Nevertheless, relying on precedents can also make courts more efficient, whereas recent advances in natural language processing (NLP) and machine learning (ML) open doors for solutions to automated and reliable identification of similar cases. In this study, we investigated more than a hundred combinations of document representations and textual vectorization models to assess whether pairs of cases identified by the machine satisfy the human notion of similarity. To this point, analogous models have been evaluated using tiny validation samples. We used a statistically significant sample evaluated by legal experts from an administrative court in Brazil, constituting a gold standard sample. We also propose using evaluation metrics that are meaningful to real-world applications and build upon previous works, employing promising solutions and exploring the extraction of concepts and relationships from legal texts. The results demonstrate that such applications can identify a large proportion of similar cases that can be interpreted as legal precedents. Models that rely on more granular representations of text achieved the best performance. In addition, extracting concepts and relations proved to improve the results, while using models that are more complex and difficult to train may not be the best option. These findings can guide the development of recommendation systems to improve efficiency and consistency in law courts and motivate studies that explore other techniques for this purpose.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.