Recent advances in Artificial intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on transformers, namely BERT, GPT-2 and RoBERTa pre-trained in the Brazilian Portuguese language and the same specialized using 210,000 legal proceedings. Documents were pre-processed and had their content transformed into a vector representation using these NLP techniques. Unsupervised learning was used to cluster the lawsuits, calculating the quality of the model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers present better performance when compared to previous research, highlighting the RoBERTa model specialized in the Brazilian Portuguese language, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.Keywords legal • natural language processing • clustering • transformers
IntroductionThe recent history of the Brazilian Justice shows relevant transformations regarding having all its procedural documents in digital format. In 2012, the Brazilian Labor Court implemented the Electronic Judicial Process (acronym in Portuguese for "Processo Judicial Eletrônico" -PJe), and since then, all new lawsuits have become completely digital, reaching 99.9% of cases in progress on this platform in 2020 [1].Knowing the limitation of human beings analysing, in an acceptable time, a large amount of data, especially when such data appear not to be correlated, it is possible to help them in the patterns' recognition context through data analysis, computational ans statistical methods. Assuming that textual data has been exponentially increasing, patterns' examination in court documents is becoming pronouncedly challenging.To optimize the procedural progress the Brazilian legal system provides for ways, such as the procedural economy, the principle of speed, due process in order, and the principle of the reasonable duration of a case to ensure the swift handling of judicial proceedings [2]. Hence, one of the major challenges of the Brazilian Justice is swiftly meeting the growing judicial demand. Thus, using a process grouping mechanism, it was possible to assist with the allocation