We introduce the notion of query substitution, that is, generating a new query to replace a user's original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.
The goal of a recommender system is to suggest items of interest to a user based on historical behavior of a community of users. Given detailed enough history, item-based collaborative filtering (CF) often performs as well or better than almost any other recommendation method. However, in cold-start situations-where a user, an item, or the entire system is new-simple non-personalized recommendations often fare better. We improve the scalability and performance of a previous approach to handling cold-start situations that uses filterbots, or surrogate users that rate items based only on user or item attributes. We show that introducing a very small number of simple filterbots helps make CF algorithms more robust. In particular, adding just seven global filterbots improves both user-based and item-based CF in cold-start user, cold-start item, and cold-start system settings. Performance is better when data is scarce, performance is no worse when data is plentiful, and algorithm efficiency is negligibly affected. We systematically compare a non-personalized baseline, user-based CF, item-based CF, and our bot-augmented user-and item-based CF algorithms using three data sets (Yahoo! Movies, MovieLens, and EachMovie) with the normalized MAE metric in three types of cold-start situations. The advantage of our "naïve filterbot" approach is most pronounced for the Yahoo! data, the sparsest of the three data sets.
Abstract. We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.