Analysis of Document Clustering based on Cosine Similarity and K-Main Algorithms

Triwijoyo, Bambang Krismono; Kartarina, Kartarina

doi:10.33557/journalisi.v1i2.18

Cited by 3 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Kombinasi beberapa pendekatan tersebut dilakukan untuk menjawab permasalahan dengan menggabungkan kelebihan dari penelitian terdahulu. Penelitian ini juga menggunakan K-Means untuk pengelompokkan teks tugas akhir (judul+abstrak) yang umumnya memakai cosine similarity seperti penelitian lain serupa namun bertujuan rekomendasi dosen pembimbing tugas akhir [7] atau proses menghitung kemiripan hasil klasterisasi dokumen [8].…”

Section: Pendahuluanunclassified

Pemodelan Topik dengan LDA untuk Temu Kembali Informasi dalam Rekomendasi Tugas Akhir

Purwitasari

Muflichah

Hasanah

et al. 2021

RESTI

View full text Add to dashboard Cite

Undergraduate thesis as the final project, or in Indonesian called as Tugas Akhir, for each undergraduate student is a pre-requisite before student graduation and the successfulness in finishing the project becomes as one of learning outcomes among others. Determining the topic of the final project according to the ability of students is an important thing. One strategy to decide the topic is reading some literatures but it takes up more time. There is a need for a recommendation system to help students in determining the topic according to their abilities or subject understanding which is based on their academic transcripts. This study focused on a system for final project topic recommendations based on evaluating competencies in previous academic transcripts of graduated students. Collected data of previous final projects, namely titles and abstracts weighted by term occurences of TF-IDF (term frequency–inverse document frequency) and grouped by using K-Means Clustering. From each cluster result, we prepared candidates for recommended topics using Latent Dirichlet Allocation (LDA) with Gibbs Sampling that focusing on the word distribution of each topic in the cluster. Some evaluations were performed to evaluate the optimal cluster number, topic number and then made more thorough exploration on the recommendation results. Our experiments showed that the proposed system could recommend final project topic ideas based on student competence represented in their academic transcripts.

show abstract

Section: Pendahuluanunclassified

Pemodelan Topik dengan LDA untuk Temu Kembali Informasi dalam Rekomendasi Tugas Akhir

Purwitasari

Muflichah

Hasanah

et al. 2021

RESTI

View full text Add to dashboard Cite

show abstract

“…Cosine similarity is another method that uses unsupervised learning techniques like Word2Vec CBoW, Word2Vec Skip-gram, and TF-IDF to determine how similar court documents are to one another [10]. Cosine similarity is employed to group lawful [11]. This research aims to confidently assist lawmakers and legal writers in thoroughly searching for keywords (norms) in UU using Bahasa and implementing them in lawmaking with the aid of a search engine, this approach will significantly save time in understanding each existing law.…”

Section: Introductionmentioning

confidence: 99%

Implementation of Cosine Similarity Algorithm on Omnibus Law Drafting

Syam,

et al. 2024

IJACSA

View full text Add to dashboard Cite

Drafting of Omnibus Laws presents a complex challenge in legal governance, often involving the integration and consolidation of disparate legal provisions into a unified framework. In this context, the application of advanced computational techniques becomes crucial for streamlining the drafting process and ensuring coherence across the law's various components. Cosine similarity, a widely used measure in natural language processing and document analysis, offers a quantitative means to assess the similarity between different sections or articles within the Omnibus Law draft. By representing legal texts as high-dimensional vectors in a vector space model, cosine similarity enables the comparison of textual similarity based on the cosine of the angle between these vectors. Implementing cosine similarity in the context of omnibus law using FastAPI and Laravel can be a valuable tool for analyzing similarity between legal documents, especially in the context of omnibus law. Legal practitioners and researchers can use the cosine similarity measure to compare the textual content of different legal documents and identify similarities. This can aid in tasks such as legal document retrieval, clustering similar provisions, and detecting potential inconsistencies. The combination of FastAPI and Laravel provides a potent and efficient way to develop and deploy this functionality, contributing to the advancement of legal informatics and analysis. The dataset used is Undang-Undang (UU) which used Bahasa from 1945 to 2022, comprising a total of 1705 UU. The implemented cosine similarity yielded a recall rate of 90.10% on the law.

show abstract

“…The core objectives of author research are twofold. Firstly, author endeavour to elevate the performance of document clustering by incorporating cosine similarity within the framework of the K-Means algorithm [9]. This approach is selected for its effectiveness in capturing semantic similarities among textual data, rendering it particularly apt for the nuanced analysis required in comment clustering.…”

Section: Introductionmentioning

confidence: 99%

Performance Optimization of Document Clustering for Harry Potter Series Comments using Cosine Similarity

Septian,

Zikry,

Dwi Putriani

2024

JISIT

View full text Add to dashboard Cite

This research delves into the distinctive realm of comment clustering, focusing on the extensive discourse generated by the Harry Potter series. Leveraging a dataset from Kaggle, the study aims to optimize document clustering using cosine similarity within the K-Means algorithm. The research addresses the nuanced dynamics of sentiment and preferences within the Harry Potter fan community. A comprehensive methodology involves data collection, preprocessing, TF-IDF initialization, K-Means clustering with varying distance metrics, and result evaluation. The dataset of 491 respondents unveils diverse gender, geographical, and age distributions, adding complexity to the analysis. The K-Means clustering results highlight predominant positive sentiment, emphasizing the enduring popularity of the series. The study's originality lies in its focus on the Harry Potter cultural phenomenon, contributing to sentiment analysis and fan engagement discourse. The implications extend to researchers, practitioners, and enthusiasts seeking a deeper understanding of online discussions surrounding iconic media franchises.

show abstract

Analysis of Document Clustering based on Cosine Similarity and K-Main Algorithms

Cited by 3 publications

References 16 publications

Pemodelan Topik dengan LDA untuk Temu Kembali Informasi dalam Rekomendasi Tugas Akhir

Pemodelan Topik dengan LDA untuk Temu Kembali Informasi dalam Rekomendasi Tugas Akhir

Implementation of Cosine Similarity Algorithm on Omnibus Law Drafting

Performance Optimization of Document Clustering for Harry Potter Series Comments using Cosine Similarity

Contact Info

Product

Resources

About