We introduce the notion of query substitution, that is, generating a new query to replace a user's original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.
The goal of a recommender system is to suggest items of interest to a user based on historical behavior of a community of users. Given detailed enough history, item-based collaborative filtering (CF) often performs as well or better than almost any other recommendation method. However, in cold-start situations-where a user, an item, or the entire system is new-simple non-personalized recommendations often fare better. We improve the scalability and performance of a previous approach to handling cold-start situations that uses filterbots, or surrogate users that rate items based only on user or item attributes. We show that introducing a very small number of simple filterbots helps make CF algorithms more robust. In particular, adding just seven global filterbots improves both user-based and item-based CF in cold-start user, cold-start item, and cold-start system settings. Performance is better when data is scarce, performance is no worse when data is plentiful, and algorithm efficiency is negligibly affected. We systematically compare a non-personalized baseline, user-based CF, item-based CF, and our bot-augmented user-and item-based CF algorithms using three data sets (Yahoo! Movies, MovieLens, and EachMovie) with the normalized MAE metric in three types of cold-start situations. The advantage of our "naïve filterbot" approach is most pronounced for the Yahoo! data, the sparsest of the three data sets.
Abstract. We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.
Query logs, the patterns of activity left by millions of users, contain a wealth of information that can be mined to aid personalization. We perform a large-scale study of Yahoo! search engine logs, tracking 1.35 million browser-cookies over a period of 6 months. We define metrics to address questions such as 1) How much history is available?, 2) How do users' topical interests vary, as reflected by their queries?, and 3) What can we learn from user clicks? We find that there is significantly more expected history for the user of a randomly picked query than for a randomly picked user. We show that users exhibit consistent topical interests that vary between users. We also see that user clicks indicate a variety of special interests. Our findings shed light on user activity and can inform future personalization efforts.
We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn 2 ) obtained by Andersson and Vorbyov [2006]. We also present a randomized O(m 1/2 n 2 )-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn 2 )-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987]. ACM Reference Format:Madani, O., Thorup, M., and Zwick, U. 2010. Discounted deterministic Markov decision processes and discounted all-pairs shortest paths.
The following coins problem is a version of a multi-armed bandit problem where one has to select from among a set of objects, say classifiers, after an experimentation phase that is constrained by a time or cost budget. The question is how to spend the budget. The problem involves pure exploration only, differentiating it from typical multi-armed bandit problems involving an exploration/exploitation tradeoff [BF85]. It is an abstraction of the following scenarios: choosing from among a set of alternative treatments after a fixed number of clinical trials, determining the best parameter settings for a program given a deadline that only allows a fixed number of runs; or choosing a life partner in the bachelor/bachelorette TV show where time is limited. We are interested in the computational complexity of the coins problem and/or efficient algorithms with approximation guarantees. The Coins ProblemWe are given:-A collection of n independent coins, indexed by the set I, where each coin is specified by a probability density function (prior) over its head probability. The priors of the different coins are independent, and they can be different for different coins. -A budget b on the total number of coin flips.We assume the tail and the head outcomes correspond to receiving no reward and a fixed reward (1 unit) respectively. We are allowed a trial/learning period, constrained by the budget, for the sole purpose of experimenting with the coins, i.e., we do not collect rewards in this period. At the end of the period, we are allowed to pick only a single coin for all our future flips (reward collection).Let the actual head probability of coin i be θ i . We define the regret from picking coin i to be θ * −θ i , where θ * = max j∈I θ j . As we have the densities only, we basically seek to make coin flip decisions and a final choice that lead to minimizing our expected regret. It is easy to verify that when the budget is 0, the choice of coin that minimizes expected regret is one with maximum expected head probability over all the coins, i.e., max i E(Θ i ), where Θ i denotes the random variable corresponding to head probability of coin i, and the expectation E(Θ i ) is taken over the density for coin i.A strategy is a prescription of which coin to flip given all the coins' flip outcomes so far. A strategy may be viewed as a finite directed rooted tree, where each node indicates
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.