The optimum clustering framework: implementing the cluster hypothesis

Fuhr, Norbert; Lechtenfeld, Marc; Stein, Benno; Gollub, Tim

doi:10.1007/s10791-011-9173-9

Cited by 26 publications

(12 citation statements)

References 53 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, positioning the constituent documents of these clusters at the top of the result list yields highly effective retrieval performance; specifically, much better than that of state-of-the art retrieval methods that rank documents directly [8,32,25,14,10].…”

Section: Introductionmentioning

confidence: 99%

Ranking document clusters using markov random fields

Raiber

Kurland

2013

Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types of cluster-relevance evidence; e.g., the query-similarity values of the cluster's documents and queryindependent measures of the cluster. We use our method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. The resultant retrieval effectiveness is substantially better than that of the initial list for several lists that are produced by effective retrieval methods. Furthermore, our cluster ranking approach significantly outperforms stateof-the-art cluster ranking methods. We also show that our method can be used to improve the performance of (stateof-the-art) results-diversification methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Ranking document clusters using markov random fields

Raiber

Kurland

2013

Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Table 1 and Figure 3 below are the results of research conducted by Bustos and Pertusa, who carried out the precision, recall, F1 score, and Cohen's K process [16]. Fine-grained algorithm (FGA): FGA presumes the turned form of the cluster hypothesis [23], that is, the relevant documents returned in response to a query will be inclined to be similar to one another. FGA uses a combination of loci and relevant cluster concepts to efficiently form clusters.…”

Section: Resultsmentioning

confidence: 99%

“…Since this dataset is quite large, we corrected for it using random balanced under sampling, which resulted in a reduced dataset size with 4 million labeled samples. A snippet of clinical statements and classes can be seen in Figure 2 Fine-grained algorithm (FGA): FGA presumes the turned form of the cluster hypothesis [23], that is, the relevant documents returned in response to a query will be inclined to be similar to one another. FGA uses a combination of loci and relevant cluster concepts to efficiently form clusters.…”

mentioning

confidence: 99%

Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification

Jasmir

Nurmaini

Tutuko

2021

BDCC

View full text Add to dashboard Cite

Text classification is an important component in many applications. Text classification has attracted the attention of researchers to continue to develop innovations and build new classification models that are sourced from clinical trial texts. In building classification models, many methods are used, including supervised learning. The purpose of this study is to improve the computational performance of one of the supervised learning methods, namely KNN, in building a clinical trial document text classification model by combining KNN and the fine-grained algorithm. This research contributed to increasing the computational performance of KNN from 388,274 s to 260,641 s in clinical trial texts on a clinical trial text dataset with a total of 1,000,000 data.

show abstract

“…The relevance score of j th document in with is calculated using term frequency * inverse document frequency (tf*idf) (Fuhr et al. 2012 ) as, where is the frequency of term w in document , is the frequency of term w in the document collection , is a normalization controlling the importance of terms based on term frequency (Gormley and Tong 2015 ).…”

Section: Ranking-based Misinformation Detection (Rmid)mentioning

confidence: 99%

Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization

Balasubramaniam

Nayak

Luong

et al. 2021

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

Social media platforms like Twitter have become an easy portal for billions of people to connect and exchange their thoughts. Unfortunately, people commonly use these platforms to share misinformation which can influence other people adversely. The spread of misinformation is unavoidable in an extraordinary situation like Covid-19, and the consequences can be dreadful. This paper proposes a two-step ranking-based misinformation detection (RMiD) technique. Firstly, a novel ranking-based approach leveraging the scalable information retrieval infrastructure is applied to detect misinformation from a huge collection of unlabelled tweets based on a related but very small labelled misinformation data set. Secondly, the identified misinformation tweets are represented as a coupled matrix tensor model and Nonnegative Coupled Matrix Tensor Factorization is applied to learn their spatio-temporal topic dynamics. The experimental analysis shows that RMiD is capable of detecting misinformation with better coverage and less noise in comparison with existing techniques. Moreover, the coupled matrix tensor representation has improved the quality of topics discovered from unlabelled data up to 4% by leveraging the semantic similarity of terms in labelled data. Supplementary Information The online version supplementary material available at 10.1007/s13278-021-00767-7.

show abstract

The optimum clustering framework: implementing the cluster hypothesis

Cited by 26 publications

References 53 publications

Ranking document clusters using markov random fields

Ranking document clusters using markov random fields

Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification

Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization

Contact Info

Product

Resources

About