An Unsupervised Content-Based Article Recommendation System Using Natural Language Processing

Renuka, S.; Kiran, G. S. S. Raj; Rohit, Palakodeti

doi:10.1007/978-981-15-8530-2_13

Cited by 16 publications

(13 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, we sought to expand the research by removing the restriction to the legal area bringing light to other publications. In [12], we discussed the content recommendation system approaches based on grouping for similar articles that used TF-IDF to perform vector transformation of the document contents and, through cosine similarity, applied k-means [13] for clustering them. In [14], the authors automatically summarized texts using TF-IDF and k-means to determine the document's textual groups used to create the abstract.…”

Section: State-of-the-art Reviewmentioning

confidence: 99%

Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Oliveira¹,

Nascimento²

2022

Data Clustering

View full text Add to dashboard Cite

The Brazilian legal system postulates the expeditious resolution of judicial proceedings. However, legal courts are working under budgetary constraints and with reduced staff. As a way to face these restrictions, artificial intelligence (AI) has been tackling many complex problems in natural language processing (NLP). This work aims to detect the degree of similarity between judicial documents that can be achieved in the inference group using unsupervised learning, by applying three NLP techniques, namely term frequency-inverse document frequency (TF-IDF), Word2Vec CBoW, and Word2Vec Skip-gram, the last two being specialized with a Brazilian language corpus. We developed a template for grouping lawsuits, which is calculated based on the cosine distance between the elements of the group to its centroid. The Ordinary Appeal was chosen as a reference file since it triggers legal proceedings to follow to the higher court and because of the existence of a relevant contingent of lawsuits awaiting judgment. After the data-processing steps, documents had their content transformed into a vector representation, using the three NLP techniques. We notice that specialized word-embedding models—like Word2Vec—present better performance, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.

show abstract

Section: State-of-the-art Reviewmentioning

confidence: 99%

Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Oliveira¹,

Nascimento²

2022

Data Clustering

View full text Add to dashboard Cite

show abstract

“…Therefore, we then sought to expand the research by removing the restriction for the legal area, which revealed some publications. [16] Discusses using a content recommendation system based on grouping, with k-means, in similar articles through the vector transformation of the content of documents with the TF-IDF [17]. In [18], the authors performed an automatic summarization of texts using TF-IDF and k-means to determine the sentence groups of the documents used in creating the summary.…”

Section: State-of-the-art Reviewmentioning

confidence: 99%

Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers

Oliveira¹,

Nascimento²

2022

Preprint

View full text Add to dashboard Cite

Recent advances in Artificial intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on transformers, namely BERT, GPT-2 and RoBERTa pre-trained in the Brazilian Portuguese language and the same specialized using 210,000 legal proceedings. Documents were pre-processed and had their content transformed into a vector representation using these NLP techniques. Unsupervised learning was used to cluster the lawsuits, calculating the quality of the model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers present better performance when compared to previous research, highlighting the RoBERTa model specialized in the Brazilian Portuguese language, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.Keywords legal • natural language processing • clustering • transformers IntroductionThe recent history of the Brazilian Justice shows relevant transformations regarding having all its procedural documents in digital format. In 2012, the Brazilian Labor Court implemented the Electronic Judicial Process (acronym in Portuguese for "Processo Judicial Eletrônico" -PJe), and since then, all new lawsuits have become completely digital, reaching 99.9% of cases in progress on this platform in 2020 [1].Knowing the limitation of human beings analysing, in an acceptable time, a large amount of data, especially when such data appear not to be correlated, it is possible to help them in the patterns' recognition context through data analysis, computational ans statistical methods. Assuming that textual data has been exponentially increasing, patterns' examination in court documents is becoming pronouncedly challenging.To optimize the procedural progress the Brazilian legal system provides for ways, such as the procedural economy, the principle of speed, due process in order, and the principle of the reasonable duration of a case to ensure the swift handling of judicial proceedings [2]. Hence, one of the major challenges of the Brazilian Justice is swiftly meeting the growing judicial demand. Thus, using a process grouping mechanism, it was possible to assist with the allocation

show abstract

“…Kang et al [ 46 ] extract key phrases from CiteSeer to describe the diversity of recommended papers. Renuka et al [ 86 ] apply rapid automatic keyword extraction.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Renuka et al [ 86 ] propose a paper recommendation approach utilising TF-IDF representations of automatically extracted keywords and key phrases. They then either use cosine similarity between vectors or a clustering method to identify the most similar papers for an input paper.…”

Section: Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

Scientific paper recommendation systems: a literature review of recent publications

Kreutz

Schenkel

2022

Int J Digit Libr

View full text Add to dashboard Cite

Scientific writing builds upon already published papers. Manual identification of publications to read, cite or consider as related papers relies on a researcher’s ability to identify fitting keywords or initial papers from which a literature search can be started. The rapidly increasing amount of papers has called for automatic measures to find the desired relevant publications, so-called paper recommendation systems. As the number of publications increases so does the amount of paper recommendation systems. Former literature reviews focused on discussing the general landscape of approaches throughout the years and highlight the main directions. We refrain from this perspective, instead we only consider a comparatively small time frame but analyse it fully. In this literature review we discuss used methods, datasets, evaluations and open challenges encountered in all works first released between January 2019 and October 2021. The goal of this survey is to provide a comprehensive and complete overview of current paper recommendation systems.

show abstract

An Unsupervised Content-Based Article Recommendation System Using Natural Language Processing

Cited by 16 publications

References 3 publications

Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers

Scientific paper recommendation systems: a literature review of recent publications

Contact Info

Product

Resources

About