Abstract. We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixedqueries trees are particularly efficient for applications in which comparing two elements is expensive.
Abstract. This paper introduces a simple intersection algorithm for two sorted sequences that is fast on average. It is related to the multiple searching problem and to merging. We present the worst and average case analysis, showing that in the former, the complexity nicely adapts to the smallest list size. In the later case, it performs less comparisons than the total number of elements on both inputs when n = αm (α > 1). Finally, we show its application to fast query processing in Web search engines, where large intersections, or differences, must be performed fast.
Abstract. In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.