The appropriateness of evaluation criteria and measures have been a subject of debate and a vital concern in the information retrieval evaluation literature. A study was conducted to investigate the appropriateness of 20 measures for evaluating interactive information retrieval performance, representing four major evaluation criteria. Among the 20 measures studied were the two most well-known relevancebased measures of effectiveness, recall and precision. The user's judgment of information retrieval success was used as the devised criterion measure with which all other 20 measures were to be correlated. A sample of 40 end-users with individual information problems from an academic environment were observed, interacting with six professional intermediaries searching on their behalf in large operational systems. Quantitative data consisting of values for all measures studied and verbal data containing users' reasons for assigning certain values to selected measures were collected. Statistical analysis of the quantitative data showed that precision, one of the most important traditional measures of effectiveness, is not significantly correlated with the user's judgment of success. Users appear to be more concerned with absolute recall than with precision, although absolute recall was not directly tested in the study. Four related measures of recall and precision are found to be significantly correlated with success. Among these are user's satisfaction with completeness of search results and user's satisfaction with precision of the search. This article explores the possible explanations for this outcome through content analysis of users' verbal data. The analysis shows that high precision does not always mean high quality (relevancy, completeness, etc.) to users because of different users' expectations. The user's purpose in obtaining information is suggested to be the primary cause for the high concern for recall. Implications for research and practice are discussed.
This paper presents an application of the model described in Part I to the evaluation of Web search engines by undergraduates. The study observed how 36 undergraduate used four major search engines to find information for their own individual problems and how they evaluated these engines based on actual interaction with the search engines. User evaluation was based on 16 performance measures representing five evaluation criteria: relevance, efficiency, utility, user satisfaction, and connectivity. Non-performance (user-related) measures were also applied. Each participant searched his/ her own topic on all four engines and provided satisfaction ratings for system features and interaction and reasons for satisfaction. Each also made relevance judgements of retrieved items in relation to his/her own information need and participated in post-search interviews to provide reactions to the search results and overall performance. The study found significant differences in precision P R1 , relative recall, user satisfaction with output display, time saving, value of search results, and overall performance among the four engines and also significant engine by discipline interactions on all these measures. In addition, the study found significant differences in user satisfaction with response time among four engines, and significant engine by discipline interaction in user satisfaction with search interface. None of the four search engines dominated in every aspect of the multidimensional evaluation. Content analysis of verbal data identified a number of user criteria and users evaluative comments based on these criteria. Results from both quantitative analysis and content analysis provide insight for system design and development, and useful feedback on strengths and weaknesses of search engines for system improvement.
The project proposes and tests a comprehensive and systematic model of user evaluation of Web search engines. The project contains two parts. Part I describes the background and the model including a set of criteria and measures, and a method for implementation. It includes a literature review for two periods. The early period (1995)(1996) portrays the settings for developing the model and the later period (1997-2000) places two applications of the model among contemporary evaluation work. Part II presents one of the applications that investigated the evaluation of four major search engines by 36 undergraduates from three academic disciplines. It reports results from statistical analyses of quantitative data for the entire sample and among disciplines, and content analysis of verbal data containing users' reasons for satisfaction. The proposed model aims to provide systematic feedback to engine developers or service providers for system improvement and to generate useful insight for system design and tool choice. The model can be applied to evaluating other compatible information retrieval systems or information retrieval (IR) techniques. It intends to contribute to developing a theory of relevance that goes beyond topicality to include value and usefulness for designing user-oriented information retrieval systems.
The World Wide Web's search engines are the main tools for indexing and retrieval of Internet resources today. Comparison and evaluation of their performance is of great importance for system developers and information professionals, as well as end‐users, for the improvement and development of better tools. The paper describes categories and special features of Web‐based databases and compares them with traditional databases. It then presents a review of the literature on the testing and evaluation of Web‐based search engines. Different methodologies and measures used in previous studies are described and their findings are summarised. The paper presents some evaluative comments on previous studies and suggests areas for future investigation, particularly evaluation of Web‐based search engines from the end‐user's perspective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.