Abstract. We consider the issue of query performance, and we propose a novel method for automatically predicting the difficulty of a query. Unlike a number of existing techniques which are based on examining the ranked lists returned in response to perturbed versions of the query with respect to the given collection or perturbed versions of the collection with respect to the given query, our technique is based on examining the ranked lists returned by multiple scoring functions (retrieval engines) with respect to the given query and collection. In essence, we propose that the results returned by multiple retrieval engines will be relatively similar for "easy" queries but more diverse for "difficult" queries. By appropriately employing Jensen-Shannon divergence to measure the "diversity" of the returned results, we demonstrate a methodology for predicting query difficulty whose performance exceeds existing state-ofthe-art techniques on TREC collections, often remarkably so.
Background The shape of the exposure-response curve for long-term ambient fine particulate (PM2.5) exposure and cause-specific mortality is poorly understood, especially for rural populations and underrepresented minorities. Methods We used hybrid machine learning and Cox proportional hazard models to assess the association of long-term PM2.5 exposures on specific causes of death for 53 million U.S. Medicare beneficiaries (aged ≥65) from 2000 to 2008. Models included strata for age, sex, race, and ZIP code and controlled for neighborhood socio-economic status (SES) in our main analyses, with approximately 4 billion person-months of follow-up, and additionally for warm season average of 1-h daily maximum ozone exposures in a sensitivity analysis. The impact of non-traffic PM2.5 on mortality was examined using two stage models of PM2.5 and nitrogen dioxide (NO2). Results A 10 μg /m3 increase in 12-month average PM2.5 prior to death was associated with a 5% increase in all-cause mortality, as well as an 8.8, 5.6, and 2.5% increase in all cardiovascular disease (CVD)-, all respiratory-, and all cancer deaths, respectively, in age, gender, race, ZIP code, and SES-adjusted models. PM2.5 exposures, however, were not associated with lung cancer mortality. Results were not sensitive to control for ozone exposures. PM2.5-mortality associations for CVD- and respiratory-related causes were positive and significant for beneficiaries irrespective of their sex, race, age, SES and urbanicity, with no evidence of a lower threshold for response or of lower Risk Ratios (RRs) at low PM2.5 levels. Associations between PM2.5 and CVD and respiratory mortality were linear and were higher for younger, Black and urban beneficiaries, but were largely similar by SES. Risks associated with non-traffic PM2.5 were lower than that for all PM2.5 and were null for respiratory and lung cancer-related deaths. Conclusions PM2.5 was associated with mortality from CVD, respiratory, and all cancer, but not lung cancer. PM2.5-associated risks of CVD and respiratory mortality were similar across PM2.5 levels, with no evidence of a threshold. Blacks, urban, and younger beneficiaries were most vulnerable to the long-term impacts of PM2.5 on mortality.
Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are available. In light of this, it should be possible to evaluate over many more queries without much more total judging effort. The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries. We present results of the track, along with deeper analysis: investigating tradeoffs between the number of queries and number of judgments shows that, up to a point, evaluation over more queries with fewer judgments is more costeffective and as reliable as fewer queries with more judgments. Total assessor effort can be reduced by 95% with no appreciable increase in evaluation errors.
[Notebook version]The Million Query (1MQ) track ran for the second time in TREC 2008. The track is designed to serve two purposes: first, it is an exploration of ad-hoc retrieval over a large set of queries and a large collection of documents; second, it investigates questions of system evaluation, in particular whether it is better to evaluate using many shallow judgments or fewer thorough judgments.Participants in the track ran 10,000 queries against a collection of 25 million documents. Section 1 describes how the corpus and queries were selected, details the submission formats, and provides a brief description of all submitted runs. Section 2 provides an overview of the judging process, including a sketch of how it alternated between two methods for selecting the small set of documents to be judged. Sections 3.1 and 3.2 provide an overview of those two selection methods, developed at UMass and NEU, respectively.In Section 4 we present some statistics about the judging process, such as the total number of queries judged, how many by each approach, and so on, and the overall results of the track. We present some additional results and analysis of the overall track in Section 5. Phase I: Running QueriesThe first phase of the track required that participating sites submit their retrieval runs. CorpusThe 1MQ track used the so-called "terabyte" or "GOV2" collection of documents. This corpus is a collection of Web data crawled from Web sites in the .gov domain in early 2004. The collection is believed to include a large proportion of the .gov pages that were crawlable at that time, including HTML and text, plus the extracted text of PDF, Word, and PostScript files. Any document longer than 256Kb was truncated to that size at the time the collection was built. Binary files are not included as part of the collection, though were captured separately for use in judging.
No abstract
Abstract. Empirical modeling of the score distributions associated with retrieved documents is an essential task for many retrieval applications. In this work, we propose modeling the relevant documents' scores by a mixture of Gaussians and modeling the non-relevant scores by a Gamma distribution. Applying variational inference we automatically trade-off the goodness-of-fit with the complexity of the model. We test our model on traditional retrieval functions and actual search engines submitted to TREC. We demonstrate the utility of our model in inferring precisionrecall curves. In all experiments our model outperforms the dominant exponential-Gaussian model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.