In today's legal environment, lawsuits and regulatory investigations require companies to embark upon increasingly intensive data-focused engagements to identify, collect and analyze large quantities of data. When documents are staged for reviewwhere they are typically assessed for relevancy or privilegethe process can require companies to dedicate an extraordinary level of resources, both with respect to human resources, but also with respect to the use of technology-based techniques to intelligently sift through data. Companies regularly spend millions of dollars producing 'responsive' electronically-stored documents for these types of matters. For several years, attorneys have been using a variety of tools to conduct this exercise, and most recently, they are accepting the use of machine learning techniques like text classification (referred to as predictive coding in the legal industry) to efficiently cull massive volumes of data to identify responsive documents for use in these matters. In recent years, a group of AI and Machine Learning researchers have been actively researching Explainable AI. In an explainable AI system, actions or decisions are human understandable. In typical legal 'document review' scenarios, a document can be identified as responsive, as long as one or more of the text snippets (small passages of text) in a document are deemed responsive. In these scenarios, if predictive coding can be used to locate these responsive snippets, then attorneys could easily evaluate the model's document classification decision. When deployed with defined and explainable results, predictive coding can drastically enhance the overall quality and speed of the document review process by reducing the time it takes to review documents. Moreover, explainable predictive coding provides lawyers with greater confidence in the results of that supervised learning task. The authors of this paper propose the concept of explainable predictive coding and simple explainable predictive coding methods to locate responsive snippets within responsive documents. We also report our preliminary experimental results using the data from an actual legal matter that entailed this type of document review. The purpose of this paper is to demonstrate the feasibility of explainable predictive coding in the context of professional services in the legal space. Keywords-machine learning, text categorization, explainable AI, predictive coding, explainable predictive coding, legal document reviewI.
Companies regularly spend millions of dollars producing electronically-stored documents in legal matters. Over the past two decades, attorneys have been using a variety of technologies to conduct this exercise, and most recently, parties on both sides of the 'legal aisle' are accepting the use of machine learning techniques like text classification to cull massive volumes of data and to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs in legal matters, text classification also faces a peculiar perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box." Put simply, very little information is provided for attorneys to understand why documents are classified as responsive. In recent years, a group of AI and Machine Learning researchers have been actively researching Explainable AI. In an explainable AI system, actions or decisions are human understandable. In legal 'document review' scenarios, a document can be identified as responsive, as long as one or more of the text snippets (small passages of text) in a document are deemed responsive. In these scenarios, if text classification can be used to locate these responsive snippets, then attorneys could easily evaluate the model's document classification decision. When deployed with defined and explainable results, text classification can drastically enhance the overall quality and speed of the document review process by reducing the time it takes to review documents. Moreover, explainable predictive coding provides lawyers with greater confidence in the results of that supervised learning task.This paper describes a framework for explainable text classification as a valuable tool in legal services: for enhancing the quality and efficiency of legal document review and for assisting in locating responsive snippets within responsive documents. This framework has been implemented in our legal analytics product, which has been used in hundreds of legal matters. We also report our experimental results using the data from an actual legal matter that used this type of document review.
US corporations regularly spend millions of dollars reviewing electronically-stored documents in legal matters. Recently, attorneys apply text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs of legal matters, it also faces a perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box." Put simply, no extra information is provided for attorneys to understand why documents are classified as responsive. In recent years, explainable machine learning has emerged as an active research area. In an explainable machine learning system, predictions or decisions made by a machine learning model are human understandable. In legal 'document review' scenarios, a document is responsive, because one or more of its small text snippets are deemed responsive. In these scenarios, if these responsive snippets can be located, then attorneys could easily evaluate the model's document classification decisions -this is especially important in the field of responsible AI. Our prior research identified that predictive models created using annotated training text snippets improved the precision of a model when compared to a model created using all of a set of documents' text as training. While interesting, manually annotating training text snippets is not generally practical during a legal document review. However, small increases in precision can drastically decrease the cost of large document reviews. Automating the identification of training text snippets without human review could then make the application of training text snippet-based models a practical approach. This paper proposes two simple machine learning methods to locate responsive text snippets within responsive documents without using human annotated training text snippets. The two methods were evaluated and compared with a document classification method using three datasets from actual legal matters. The results show that the two proposed methods outperform the document-level training classification method in identifying responsive text snippets in responsive documents. Additionally, the results suggest that we can automate the successful identification of training text snippets to improve the precision of our predictive models in legal document review and thereby help reduce the overall cost of review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.