In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.
Short-term increases in air pollution levels are linked to large adverse effects on health and productivity. However, existing regulatory monitoring systems lack the spatial or temporal resolution needed to capture localized events. This study uses a dense network of over 100 sensors, deployed across the city of Chicago, Illinois, to capture the spread of smoke from short-term structural fire events. Examining all large structural fires that occurred in the city over a year (N = 21), we characterize differences in PM$$_{2.5}$$
2.5
concentrations downwind versus upwind of the fires. On average, we observed increases of up to 10.7 $$\upmu$$
μ
g/m$$^{3}$$
3
(95% CI 5.7–15.7) for sensors within 2 km and up to 7.7 $$\upmu$$
μ
g/m$$^{3}$$
3
(95% CI 3.4–12.0) for sensors 2–5 km downwind of fires. Statistically significant elevated concentrations were evident as far as 5 km downwind of the location of the fire and persisted over approximately 2 h on average. This work shows how low-cost sensors can provide insight on local and short-term pollution events, enabling regulators to provide timely warnings to vulnerable populations.
Learning to rank -producing a ranked list of items specific to a query and with respect to a set of supervisory items -is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate -analytically, in simulation, and in real data examples -that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is of general interest, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.