Abstract. We consider the issue of query performance, and we propose a novel method for automatically predicting the difficulty of a query. Unlike a number of existing techniques which are based on examining the ranked lists returned in response to perturbed versions of the query with respect to the given collection or perturbed versions of the collection with respect to the given query, our technique is based on examining the ranked lists returned by multiple scoring functions (retrieval engines) with respect to the given query and collection. In essence, we propose that the results returned by multiple retrieval engines will be relatively similar for "easy" queries but more diverse for "difficult" queries. By appropriately employing Jensen-Shannon divergence to measure the "diversity" of the returned results, we demonstrate a methodology for predicting query difficulty whose performance exceeds existing state-ofthe-art techniques on TREC collections, often remarkably so.
Background The shape of the exposure-response curve for long-term ambient fine particulate (PM2.5) exposure and cause-specific mortality is poorly understood, especially for rural populations and underrepresented minorities. Methods We used hybrid machine learning and Cox proportional hazard models to assess the association of long-term PM2.5 exposures on specific causes of death for 53 million U.S. Medicare beneficiaries (aged ≥65) from 2000 to 2008. Models included strata for age, sex, race, and ZIP code and controlled for neighborhood socio-economic status (SES) in our main analyses, with approximately 4 billion person-months of follow-up, and additionally for warm season average of 1-h daily maximum ozone exposures in a sensitivity analysis. The impact of non-traffic PM2.5 on mortality was examined using two stage models of PM2.5 and nitrogen dioxide (NO2). Results A 10 μg /m3 increase in 12-month average PM2.5 prior to death was associated with a 5% increase in all-cause mortality, as well as an 8.8, 5.6, and 2.5% increase in all cardiovascular disease (CVD)-, all respiratory-, and all cancer deaths, respectively, in age, gender, race, ZIP code, and SES-adjusted models. PM2.5 exposures, however, were not associated with lung cancer mortality. Results were not sensitive to control for ozone exposures. PM2.5-mortality associations for CVD- and respiratory-related causes were positive and significant for beneficiaries irrespective of their sex, race, age, SES and urbanicity, with no evidence of a lower threshold for response or of lower Risk Ratios (RRs) at low PM2.5 levels. Associations between PM2.5 and CVD and respiratory mortality were linear and were higher for younger, Black and urban beneficiaries, but were largely similar by SES. Risks associated with non-traffic PM2.5 were lower than that for all PM2.5 and were null for respiratory and lung cancer-related deaths. Conclusions PM2.5 was associated with mortality from CVD, respiratory, and all cancer, but not lung cancer. PM2.5-associated risks of CVD and respiratory mortality were similar across PM2.5 levels, with no evidence of a threshold. Blacks, urban, and younger beneficiaries were most vulnerable to the long-term impacts of PM2.5 on mortality.
Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are available. In light of this, it should be possible to evaluate over many more queries without much more total judging effort. The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries. We present results of the track, along with deeper analysis: investigating tradeoffs between the number of queries and number of judgments shows that, up to a point, evaluation over more queries with fewer judgments is more costeffective and as reliable as fewer queries with more judgments. Total assessor effort can be reduced by 95% with no appreciable increase in evaluation errors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.