2010
DOI: 10.5120/826-1171
|View full text |Cite
|
Sign up to set email alerts
|

A Frequent Concepts Based Document Clustering Algorithm

Abstract: This paper presents a novel technique of document clustering based on frequent concepts. The proposed technique, FCDC (Frequent Concepts based document clustering), a clustering algorithm works with frequent concepts rather than frequent items used in traditional text mining techniques. Many well known clustering algorithms deal with documents as bag of words and ignore the important relationships between words like synonyms. the proposed FCDC algorithm utilizes the semantic relationship between words to creat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0
1

Year Published

2012
2012
2017
2017

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(20 citation statements)
references
References 18 publications
0
19
0
1
Order By: Relevance
“…In [11], a new technique based on frequent concepts for document clustering is proposed. Frequent Concepts based Document Clustering (FCDC) algorithm utilizes the semantic relationship between words, explored using WordNet ontology, to create concepts.…”
Section: Review Of Semantic Driven Document Clustering Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In [11], a new technique based on frequent concepts for document clustering is proposed. Frequent Concepts based Document Clustering (FCDC) algorithm utilizes the semantic relationship between words, explored using WordNet ontology, to create concepts.…”
Section: Review Of Semantic Driven Document Clustering Methodsmentioning
confidence: 99%
“…But, taking into account synonyms and hypernyms, disambiguated only by PoS tags, is not successful in improving clustering effectiveness because of the noise produced by all the incorrect senses extracted from WordNet. A possible solution is proposed which uses a word-by-word disambiguation in order to choose the correct sense of a word in [11]. In [6] Clustering based on Frequent Word Sequences (CFWS) has been proposed.…”
Section: Overview Of Clustering Algorithmsmentioning
confidence: 99%
“…The experiments showed that using the semantic WN concepts features were promising and outperformed the baseline BoW model. Also, Baghel and Dhir in [18] proposed a hierarchy clustering algorithm to cluster the documents based on the concepts representation. The concepts were extracted from WN using the FstC WSD strategy.…”
Section: Related Workmentioning
confidence: 99%
“…A good stemmer should be able to convert different syntactic forms of a word into its normalized form, reduce the number of index terms, save memory and storage and may increase the performance of clustering algorithms to some extent; meanwhile it should try stemming. Porter Stemmer [27] is a widely applied method to stem documents. It is compact, simple and relatively accurate.…”
Section: A Document Preprocessing Stagementioning
confidence: 99%