Proceedings of the Sixth International Workshop on Information Retrieval With Asian Languages - 2003
DOI: 10.3115/1118935.1118952
|View full text |Cite
|
Sign up to set email alerts
|

Keyword-based document clustering

Abstract: Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(9 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…a query, traditional retrieval models like the vector space model, BM25 or language models are used. This way, the clustering process gets related to clustering under the bag-of-words model or keyword based clustering (Kang 2003;). 2.…”
Section: Related Workmentioning
confidence: 99%
“…a query, traditional retrieval models like the vector space model, BM25 or language models are used. This way, the clustering process gets related to clustering under the bag-of-words model or keyword based clustering (Kang 2003;). 2.…”
Section: Related Workmentioning
confidence: 99%
“…These tools have already been described and evaluated in our previously published work; therefore, here we do not go into detail about their functionalities. It has often been pointed out (Kang 2003;Nenadic et al 2003;Hammouda and Kamel 2004) that terminological strings (e.g. multi-word sequences, or key phrases) are more informative features than are single words for representing the content of a document.…”
Section: Concept Extractionmentioning
confidence: 99%
“…In order to retrieve the articles related to the videos we summarize, the keywords in the transcript-word set are sorted according to their word occurrences. The highest 25%-35% of the keywords are preserved (the percentage we use is the approximate lower bound suggested in [36]. It helps to decrease the search time without losing too much discriminating ability).…”
Section: ) Visual and Text Content Pre-analysismentioning
confidence: 99%