Charles L. A. Clarke scite author profile

Evaluation measures act as objective functions to be optimized by information retrieval systems. Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions. Ambiguity in queries and redundancy in retrieved documents are poorly reflected by current evaluation measures. In this paper, we present a framework for evaluation that systematically rewards novelty and diversity. We develop this framework into a specific evaluation measure, based on cumulative gain. We demonstrate the feasibility of our approach using a test collection based on the TREC question answering track.

show abstract

Reciprocal rank fusion outperforms condorcet and individual rank learning methods

Cormack

2009

View full text Add to dashboard Cite

Reciprocal Rank Fusion (RRF), a simple method for combining the document rankings from multiple IR systems, consistently yields better results than any individual system, and better results than the standard method Condorcet Fuse. This result is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that ranks the LETOR 3 dataset better than any previously reported method. RECIPROCAL RANK FUSIONWhile supervised learning-to-rank methods have garnered much attention of late, unsupervised methods are attractive because they require no training examples. In the search for such a method we came up with Reciprocal Rank Fusion (RRF) to serve as a baseline. We found that RRF, when used to combine the results of IR methods (including learning to rank), almost invariably improved on the best of the combined results. We also found that RRF consistently equaled or bettered other methods we tried, including established metaranking standards Condorcet Fuse and CombMNZ (cf. [4]).RRF simply sorts the documents according to a naive scoring formula. Given a set D of documents to be ranked and a set of rankings R, each a permutation on 1..|D|, we computewhere k = 60 was fixed during a pilot investigation and not altered during subsequent validation. Our intuition in choosing this formula derived from fact that while highlyranked documents are more important, the importance of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. lower-ranked documents does not vanish as it would were, say, an exponential function used. The constant k mitigates the impact of high rankings by outlier systems. Condorcet Fuse combines rankings by sorting the documents according to the pairwise relation r(d1) < r(d2), which is determined for each (d1, d2) by majority vote among the input rankings. CombMNZ requires for each r a corresponding scoring function sr : D → R and a cutoff rank c which all contribute to the CombMNZ score:We conducted four pilot experiments, each combining the results of 30 configurations of Wumpus Search applied to four different TREC collections. The results of the first, shown in table 1, indicated that k = 60 was near-optimal, but that the choice was not critical. The results also showed, somewhat unexpectedly, that RRF bested competing approaches, as well as more sophisticated learning methods whose investigation was the original impetus for our work.We repeated our experiment with four sets of submissions to TREC tasks; the particular sets were selected because they have been used in previous metaranking evaluation. It is worthy of note that, while our pilot runs used exactly the same set of Wumpus configurations...

show abstract

Efficient and effective spam filtering and re-ranking for large web datasets

2011

View full text Add to dashboard Cite

The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam -pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the "spamminess" of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixedcutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of "honeypot" queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering -from among the worst to among the best.

show abstract

Galaxy And Mass Assembly: accurate panchromatic photometry from optical priors using lambdar

Wright

Robotham

Bourne

et al. 2016

Mon. Not. R. Astron. Soc.

158

137

View full text Add to dashboard Cite

We present the Lambda Adaptive Multi-Band Deblending Algorithm in R (lambdar), a novel code for calculating matched aperture photometry across images that are neither pixel-nor PSF-matched, using prior aperture definitions derived from high resolution optical imaging. The development of this program is motivated by the desire for consistent photometry and uncertainties across large ranges of photometric imaging, for use in calculating spectral energy distributions. We describe the program, specifically key features required for robust determination of panchromatic photometry: propagation of apertures to images with arbitrary resolution, local background estimation, aperture normalisation, uncertainty determination and propagation, and object deblending. Using simulated images, we demonstrate that the program is able to recover accurate photometric measurements in both high-resolution, low-confusion, and low-resolution, high-confusion, regimes. We apply the program to the 21-band photometric dataset from the Galaxy And Mass Assembly (GAMA) Panchromatic Data Release (PDR;Driver et al. 2016), which contains imaging spanning the far-UV to the far-IR. We compare photometry derived from lambdar with that presented in Driver et al. (2016), finding broad agreement between the datasets. Nonetheless, we demonstrate that the photometry from lambdar is superior to that from the GAMA PDR, as determined by a reduction in the outlier rate and intrinsic scatter of colours in the lambdar dataset. We similarly find a decrease in the outlier rate of stellar masses and star formation rates using lambdar photometry. Finally, we note an exceptional increase in the number of UV and mid-IR sources able to be constrained, which is accompanied by a significant increase in the mid-IR colour-colour parameter-space able to be explored.

show abstract

Time-based calibration of effectiveness measures

2012

View full text Add to dashboard Cite

Frequency estimates for statistical word similarity measures

Terra

Clarke

2003

119

View full text Add to dashboard Cite

Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative study of two methods for estimating word cooccurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web data, and we study the impact of corpus size on the effectiveness of the measures. We base the evaluation on one TOEFL question set and two practice questions sets, each consisting of a number of multiple choice questions seeking the best synonym for a given target word. For two question sets, a context for the target word is provided, and we examine a number of word similarity measures that exploit this context. Our best combination of similarity measure and frequency estimation method answers 6-8% more questions than the best results previously reported for the same question sets.

show abstract

Exploiting redundancy in question answering

2001

View full text Add to dashboard Cite

Efficient construction of large test collections

1998

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.