Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that several documents have an identical retrieval status value with respect to a query, some probabilistic notion of precision has to be introduced. Relevance probability, expected precision, and so forth, are some alternatives mentioned in the literature for this purpose. Furthermore, when many queries are to be evaluated and the retrieval results averaged over these queries, some method of interpolation of precision values at certain preselected recall levels is needed. The currently popular approaches for handling both a weak ordering and interpolation are found to be inconsistent, and the results obtained are not easy to interpret. Moreover, in cases where some alternatives are available, no comparative analysis that would facilitate the selection of a particular strategy has been provided. In this paper, we systematically investigate the various problems and issues associated with the use of recall and precision as measures of retrieval system performance. Our motivation is to provide a comparative analysis of methods available for defining precision in a probabilistic sense and to promote a better understanding of the various issues involved in retrieval performance evaluation.
When a query is submitted to a search engine, the search engine returns a dynamically generated result page containing the result records, each of which usually consists of a link to and/or snippet of a retrieved Web page. In addition, such a result page often also contains information irrelevant to the query, such as information related to the hosting site of the search engine and advertisements. In this paper, we present a technique for automatically producing wrappers that can be used to extract search result records from dynamically generated result pages returned by search engines. Automatic search result record extraction is very important for many applications that need to interact with search engines such as automatic construction and maintenance of metasearch engines and deep Web crawling. The novel aspect of the proposed technique is that it utilizes both the visual content features on the result page as displayed on a browser and the HTML tag structures of the HTML source file of the result page. Experimental results indicate that this technique can achieve very high extraction accuracy.
Similarity-based retrieval of images is an important task in many image database applications. A major class of users' requests requires retrieving those images in the database that are spatially similar to the query image. We propose an algorithm for computing the spatial similarity between two symbolic images. A symbolic image is a logical representation of the original image where the image objects are uniquely labeled with symbolic names. Spatial relationships in a symbolic image are represented as edges in a weighted graph referred to as spatial-orientation graph. Spatial similarity is then quantified in terms of the number of, as well as the extent to which, the edges of the spatial-orientation graph of the database image conform to the corresponding edges of the spatial-orientation graph of the query image. The proposed algorithm is robust in the sense that it can deal with translation, scale, and rotational variances in images. The algorithm has quadratic time complexity in terms of the total number of objects in both the database and query images. We also introduce the idea of quantifying a system's retrieval quality by having an expert specify the expected rank ordering with respect to each query for a set of test queries. This enables us to assess the quality of algorithms comprehensively for retrieval in image databases. The characteristics of the proposed algorithm are compared with those of the previously available algorithms using a testbed of images. The comparison demonstrated that our algorithm is not only more efficient but also provides a rank ordering of images that consistently matches with the expert's expected rank ordering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.