Keyword-based document clustering

Kang, Seung-Shik

doi:10.3115/1118935.1118952

Cited by 18 publications

(9 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…a query, traditional retrieval models like the vector space model, BM25 or language models are used. This way, the clustering process gets related to clustering under the bag-of-words model or keyword based clustering (Kang 2003;). 2.…”

Section: Related Workmentioning

confidence: 99%

The optimum clustering framework: implementing the cluster hypothesis

et al. 2011

View full text Add to dashboard Cite

Document clustering offers the potential of supporting users in interactive retrieval, especially when users have problems in specifying their information need precisely. In this paper, we present a theoretic foundation for optimum document clustering. Key idea is to base cluster analysis and evalutation on a set of queries, by defining documents as being similar if they are relevant to the same queries. Three components are essential within our optimum clustering framework, OCF: (1) a set of queries, (2) a probabilistic retrieval method, and (3) a document similarity metric. After introducing an appropriate validity measure, we define optimum clustering with respect to the estimates of the relevance probability for the query-document pairs under consideration. Moreover, we show that well-known clustering methods are implicitly based on the three components, but that they use heuristic design decisions for some of them. We argue that with our framework more targeted research for developing better document clustering methods becomes possible. Experimental results demonstrate the potential of our considerations.

show abstract

Section: Related Workmentioning

confidence: 99%

The optimum clustering framework: implementing the cluster hypothesis

et al. 2011

View full text Add to dashboard Cite

show abstract

“…These tools have already been described and evaluated in our previously published work; therefore, here we do not go into detail about their functionalities. It has often been pointed out (Kang 2003;Nenadic et al 2003;Hammouda and Kamel 2004) that terminological strings (e.g. multi-word sequences, or key phrases) are more informative features than are single words for representing the content of a document.…”

Section: Concept Extractionmentioning

confidence: 99%

Semantically interconnected social networks

Cucchiarelli

D’Antonio

Velardi

2011

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

Social network analysis aims to identify collaborations and helps people organize themselves through community participation and information sharing. The primary sources for social network modelling are explicit relationships such as co-authoring, citations, friendship, etc. However, to enable the integration of on-line community information and to fully describe the content and structure of community sites, secondary sources of information, such as documents, e-mails, blogs and discussions, can be exploited. In this paper we describe a methodology and a battery of tools to automatically extract from documents the relevant topics shared among community members and to analyse the evolution of the network also in terms of emergence and decay of collaboration themes. Experiments are conducted on a scientific network funded by the European Community, the INTEROP network of excellence, and on the United Kingdom research community in medical image understanding and analysis.

show abstract

“…In order to retrieve the articles related to the videos we summarize, the keywords in the transcript-word set are sorted according to their word occurrences. The highest 25%-35% of the keywords are preserved (the percentage we use is the approximate lower bound suggested in [36]. It helps to decrease the search time without losing too much discriminating ability).…”

Section: ) Visual and Text Content Pre-analysismentioning

confidence: 99%

A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities

Chen

Wang

2009

IEEE Trans. Multimedia

113

View full text Add to dashboard Cite

Video summarization techniques have been proposed for years to offer people comprehensive understanding of the whole story in the video. Roughly speaking, existing approaches can be classified into the two types: one is static storyboard, and the other is dynamic skimming. However, despite that these traditional methods give brief summaries for users, they still do not provide with a concept-organized and systematic view. In this paper, we present a structural video content browsing system and a novel summarization method by utilizing the four kinds of entities: who, what, where, and when to establish the framework of the video contents. With the assistance of the above-mentioned indexed information, the structure of the story can be built up according to the characters, the things, the places, and the time. Therefore, users can not only browse the video efficiently but also focus on what they are interested in via the browsing interface. In order to construct the fundamental system, we employ maximum entropy criterion to integrate visual and text features extracted from video frames and speech transcripts, generating high-level concept entities. A novel concept expansion method is introduced to explore the associations among these entities. After constructing the relational graph, we exploit graph entropy model to detect meaningful shots and relations, which serve as the indices for users. The results demonstrate that our system can achieve better performance and information coverage.

show abstract

Keyword-based document clustering

Cited by 18 publications

References 7 publications

The optimum clustering framework: implementing the cluster hypothesis

The optimum clustering framework: implementing the cluster hypothesis

Semantically interconnected social networks

A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities

Contact Info

Product

Resources

About