2022
DOI: 10.11591/ijeecs.v27.i3.pp1517-1524
|View full text |Cite
|
Sign up to set email alerts
|

Document classification using term frequency-inverse document frequency and K-means clustering

Abstract: Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 31 publications
0
0
0
Order By: Relevance
“…The Porter Stemming algorithm, for instance, employs a series of about 60 rules applied sequentially. Each rule is of the form: (1) where S1 is a suffix to be replaced by S2 if a condition (usually related to the measure of the stem, or m is satisfied. The measure m is calculated as:…”
Section: Analysis and Comparison Of The Existing Text Processing Tech...mentioning
confidence: 99%
See 1 more Smart Citation
“…The Porter Stemming algorithm, for instance, employs a series of about 60 rules applied sequentially. Each rule is of the form: (1) where S1 is a suffix to be replaced by S2 if a condition (usually related to the measure of the stem, or m is satisfied. The measure m is calculated as:…”
Section: Analysis and Comparison Of The Existing Text Processing Tech...mentioning
confidence: 99%
“…Al-Obaydy, Hashim, Najm and Jalal propose an innovative approach for categorizing research articles into thematic groups, leveraging Term Frequency-Inverse Document Frequency (TF-IDF) and K-means clustering. The methodology is designed to address the challenges researchers face in navigating the vast corpus of scientific literature, aiming to cluster text documents into meaningful groups that represent similar scientific fields [1]. Shetty and Kallimani introduce an innovative approach leveraging K-Means clustering for extractive text summarization, focusing on preserving semantic richness while eliminating redundancy [2].…”
Section: Introduction (Literary Review)mentioning
confidence: 99%
“…Algoritma K-means memberikan metode sederhana untuk mengeksekusi solusi perkiraan [11]. Ini adalah pengelompokan eksklusif dan salah satu algoritma yang paling banyak digunakan untuk pengelompokan [12]. Algoritma ini sudah banyak digunakan pada penelitian sebelumnya [13].…”
Section: Pendahuluanunclassified
“…The Okapi BM25 model [10] is a popular probabilistic model that uses criteria such as term frequency, document length, and document frequency to compute the relevance score of a document [11], [12]. Reinforcement learning Reinforcement learning is a type of machine learning that involves training an agent to learn by trial and error.…”
Section: Knowledge Graphsmentioning
confidence: 99%