Abstract.The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects.In this paper, we compare classification methods from IR and Machine Learning (ML) for clustering search results. Issues such as document representation, classification algorithms, and cluster representation are discussed. We introduce several evaluation techniques and use them in preliminary experiments. These experiments indicate that the proposed techniques have promise, but it is clear that user experiments are required to carry out more thorough evaluation.T his material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor.This material is based on work supported in part by NRaD Contract Number N66001-94-D-6054. 1ÊÊIntroductionAn IR system typically produces a ranked list of documents in response to a user's query. These documents are presented to the user for examination and evaluation. Although the documents are ranked, there is significant potential benefit in providing additional structure in long retrieved lists.The role of information organization becomes even more important in the interactive model of retrieval, where the focus is on the user's participation in a cycle of query formulation, presentation of search results, and query reformulation.A natural alternative to ranking is to divide (or cluster) the retrieved set into groups of documents with common subjects. For example, consider a situation when the system is presented with a general query. The retrieval results would contain a wide variety of topics in that general area. An automatic classification tool could create classes of similar documents allowing the user to focus on a particular topic. In this paper we consider the problem of design and evaluation of such a browsing tool for an existing IR system.We begin by discussing the recent research on clustering in IR and ML. Surprisingly, only a few systems have used clustering methods for organizing retrieval results. Moreover, there is virtually no literature about attempts to evaluate these techniques. Clustering has also been studied in Machine Learning (ML) for a relatively long time and a large number of algorithms has been developed. There has, however, been few application of these techniques to IR [1].We believe there are four major issues need to be considered: ¥ the input of the classifier, or the document representations. In general, documents are treated as vectors of weight-term pairs. However, the questions of which terms to chose and whether to use the whole document...
Performance analysis of an interactive visualization system generally requires an extensive user study, a method that is very expensive and that often yields inconclusive results. To do a successful user study, the researcher has to be well aware of the system's possibilities. We present a different kind of analysis. We show how the system behavior and performance could be investigated off-line, without user intervention. Combined with a user study such analysis may help the researcher to form an objective opinion of the system's abilities -to isolate what part of the system's performance is attributed to the system as compared to the user's skill. Motivation
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.