2011
DOI: 10.1007/s10791-011-9173-9
|View full text |Cite
|
Sign up to set email alerts
|

The optimum clustering framework: implementing the cluster hypothesis

Abstract: Document clustering offers the potential of supporting users in interactive retrieval, especially when users have problems in specifying their information need precisely. In this paper, we present a theoretic foundation for optimum document clustering. Key idea is to base cluster analysis and evalutation on a set of queries, by defining documents as being similar if they are relevant to the same queries. Three components are essential within our optimum clustering framework, OCF: (1) a set of queries, (2) a pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
1

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(12 citation statements)
references
References 53 publications
(52 reference statements)
0
11
1
Order By: Relevance
“…Furthermore, positioning the constituent documents of these clusters at the top of the result list yields highly effective retrieval performance; specifically, much better than that of state-of-the art retrieval methods that rank documents directly [8,32,25,14,10].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, positioning the constituent documents of these clusters at the top of the result list yields highly effective retrieval performance; specifically, much better than that of state-of-the art retrieval methods that rank documents directly [8,32,25,14,10].…”
Section: Introductionmentioning
confidence: 99%
“…Table 1 and Figure 3 below are the results of research conducted by Bustos and Pertusa, who carried out the precision, recall, F1 score, and Cohen's K process [16]. Fine-grained algorithm (FGA): FGA presumes the turned form of the cluster hypothesis [23], that is, the relevant documents returned in response to a query will be inclined to be similar to one another. FGA uses a combination of loci and relevant cluster concepts to efficiently form clusters.…”
Section: Resultsmentioning
confidence: 99%
“…Since this dataset is quite large, we corrected for it using random balanced under sampling, which resulted in a reduced dataset size with 4 million labeled samples. A snippet of clinical statements and classes can be seen in Figure 2 Fine-grained algorithm (FGA): FGA presumes the turned form of the cluster hypothesis [23], that is, the relevant documents returned in response to a query will be inclined to be similar to one another. FGA uses a combination of loci and relevant cluster concepts to efficiently form clusters.…”
mentioning
confidence: 99%
“…The relevance score of j th document in with is calculated using term frequency * inverse document frequency (tf*idf) (Fuhr et al. 2012 ) as, where is the frequency of term w in document , is the frequency of term w in the document collection , is a normalization controlling the importance of terms based on term frequency (Gormley and Tong 2015 ).…”
Section: Ranking-based Misinformation Detection (Rmid)mentioning
confidence: 99%