Many recommendation systems produce result sets with large numbers of highly similar items. Diversifying these results is often accomplished with heuristics, which are impoverished models of users' desire for diversity. However, integrating more complex statistical models of diversity into large-scale, mature systems is challenging. Without a good match between the model's definition of diversity and users' perception of diversity, the model can easily degrade users' perception of the recommendations. In this work we present a statistical model of diversity based on determinantal point processes (DPPs). We train this model from examples of user preferences with a simple procedure that can be integrated into large and complex production systems relatively easily. We use an approximate inference algorithm to serve the model at scale, and empirical results on live YouTube homepage traffic show that this model, coupled with a re-ranking algorithm, yields substantial short-and long-term increases in user engagement.
Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages. The wide availability of parallel text and accurate parsers in English has opened up the possibility of grammar induction through partial transfer across bitext. We consider generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser (English) to constrain the space of possible target trees. Unlike previous approaches, our framework does not require full projected parses, allowing partial, approximate transfer through linear expectation constraints on the space of distributions over trees. We consider several types of constraints that range from generic dependency conservation to language-specific annotation rules for auxiliary verb analysis. We evaluate our approach on Bulgarian and Spanish CoNLL shared task data and show that we consistently outperform unsupervised methods and can outperform supervised learning for limited training data.
Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, in the common case where multiple quantiles are needed, existing differentially private algorithms scale poorly: they compute each quantile individually, splitting their privacy budget and thus decreasing accuracy. In this work we propose an instance of the exponential mechanism that simultaneously estimates m quantiles from n data points while guaranteeing differential privacy. The utility function is carefully structured to allow for an efficient implementation that avoids exponential dependence on m and returns estimates of all m quantiles in time O(mn 2 + m 2 n). Experiments show that our method significantly outperforms the current state of the art on both real and synthetic data while remaining efficient enough to be practical.
Differential privacy has become the standard for private data analysis, and an extensive literature now offers differentially private solutions to a wide variety of problems. However, translating these solutions into practical systems often requires confronting details that the literature ignores or abstracts away: users may contribute multiple records, the domain of possible records may be unknown, and the eventual system must scale to large volumes of data. Failure to carefully account for all three issues can severely impair a system's quality and usability.We present Plume, a system built to address these problems. We describe a number of sometimes subtle implementation issues and offer practical solutions that, together, make an industrial-scale system for differentially private data analysis possible. Plume is currently deployed at Google and is routinely used to process datasets with trillions of records.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.