2015
DOI: 10.15837/ijccc.2015.3.1923
|View full text |Cite
|
Sign up to set email alerts
|

An Application of Latent Semantic Analysis for Text Categorization

Abstract: It is a challenge task to discover major topics from text, which provide a better understanding of the whole corpus and can be regarded as a text categorization problem. The goal of this paper is to apply latent semantic analysis (LSA) approach to extract common factors that representing concepts hidden in a large group of text. LSA involves three steps: the first step is to set up a term-document matrix; the second step is to transform the term frequencies into a term-document matrix using various weighting s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(1 citation statement)
references
References 42 publications
(59 reference statements)
0
1
0
Order By: Relevance
“…Traditional supervised text classification methods such as Support Vector Machines (SVM), Naïve Bayes, decision trees, and Latent Semantic Analysis (LSA) K-Nearest Neighbor (KNN) generally presented by the terms and their feature weights, also known as the "Bag of Word" (BOW) representation model. The number of words determines the word vector dimension in the vocabulary, which usually results in a very high and sparse dimensional document vector [15][16][17][18][19][20][21].…”
Section: Semantic Text Classification Algorithmsmentioning
confidence: 99%
“…Traditional supervised text classification methods such as Support Vector Machines (SVM), Naïve Bayes, decision trees, and Latent Semantic Analysis (LSA) K-Nearest Neighbor (KNN) generally presented by the terms and their feature weights, also known as the "Bag of Word" (BOW) representation model. The number of words determines the word vector dimension in the vocabulary, which usually results in a very high and sparse dimensional document vector [15][16][17][18][19][20][21].…”
Section: Semantic Text Classification Algorithmsmentioning
confidence: 99%