2010
DOI: 10.5120/1640-2204
|View full text |Cite
|
Sign up to set email alerts
|

Document Clustering based on Topic Maps

Abstract: Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 13 publications
(8 reference statements)
0
11
0
Order By: Relevance
“…Increased cluster purity clearly establishes the fact that the features extracted from the three representations capture the semantics of the documents. The three approaches FIHC [6], CFWS [10] and TMHC [12] produced F-measure for the data sets (See Table IV The proposed approach clearly had shown improvement in most of test cases. This is due to the fact that the multiple representations of documents in the collection capture the semantics in a better way, and are able to produce high FMeasure which is an indication of balance precision and recall (See Figure 6).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Increased cluster purity clearly establishes the fact that the features extracted from the three representations capture the semantics of the documents. The three approaches FIHC [6], CFWS [10] and TMHC [12] produced F-measure for the data sets (See Table IV The proposed approach clearly had shown improvement in most of test cases. This is due to the fact that the multiple representations of documents in the collection capture the semantics in a better way, and are able to produce high FMeasure which is an indication of balance precision and recall (See Figure 6).…”
Section: Resultsmentioning
confidence: 99%
“…It also introduced a novel similarity measure based on common features of the two corresponding graphs of the documents. One more recent approach to capture semantic representation of documents in document representation model is introduced in [12] in which the authors proposed a topic maps based representation by using an online tool Wandora for extracting topics from a document. They also reported encouraging results for document clustering based on semantic notions.…”
Section: The Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Ahmadi et al [14] proved that topic model based clustering methods generally achieve better results than only applying traditional clustering algorithms like the K-means. LDA has been used in many papers for representation and dimensionality reduction of text documents, as well as for uncovering semantic relations in the text [15]. Ma et al [16] used LDA for document representation and identification of the most significant topics, the K-means++ algorithm was used to define the initial centers of the clusters and the K-means algorithm was used to form the final clusters.…”
Section: Topic Modeling In Document Clusteringmentioning
confidence: 99%
“…An alternate approach was taken by Rafi et al [18] who introduced a new document representation model based on the compact topic maps that are present in a document.…”
Section: Cluster Labelsmentioning
confidence: 99%