Incomplete rankings on a set of items {1, . . . , n} are orderings of the form a1 ≺ • • • ≺ a k , with {a1, . . . a k } ⊂ {1, . . . , n} and k < n. Though they arise in many modern applications, only a few methods have been introduced to manipulate them, most of them consisting in representing any incomplete ranking by the set of all its possible linear extensions on {1, . . . , n}. It is the major purpose of this paper to introduce a completely novel approach, which allows to treat incomplete rankings directly, representing them as injective words over {1, . . . , n}. Unexpectedly, operations on incomplete rankings have very simple equivalents in this setting and the topological structure of the complex of injective words can be interpretated in a simple fashion from the perspective of ranking. We exploit this connection here and use recent results from algebraic topology to construct a multiresolution analysis and develop a wavelet framework for incomplete rankings. Though purely combinatorial, this construction relies on the same ideas underlying multiresolution analysis on a Euclidean space, and permits to localize the information related to rankings on each subset of items. It can be viewed as a crucial step toward nonlinear approximation of distributions of incomplete rankings and paves the way for many statistical applications, including preference data analysis and the design of recommender systems.
International audienceData representing preferences of users are a typical example of the Big Datasets modern technologies, such as e- commerce portals, now permit to collect, in an explicit or implicit fashion. Such data are highly complex, insofar as the number of items n for which users may possibly express their preferences is explosive and the collection of items or products a given user actually examines and is capable of comparing is highly variable and of extremely low cardinality compared to n. It is the main purpose of this paper to promote a new representation of preference data, viewed as incomplete rankings. In contrast to alternative approaches, the very nature of preference data is preserved by the "multiscale analysis" we propose, identifying here "scale" with the set of items over which preferences are expressed, whose construction relies on recent results in algebraic topology. The representation of preference data it provides shares similarities with wavelet multiresolution analysis on a Euclidean space and can be computed at a reasonable cost given the complexity of the original data. Beyond computational and theoretical advantages, the "wavelet like" transform is shown to compress preference data into relatively few basis coefficients and thus facilitates statistical tasks such as distribution estimation or prediction. This is illustrated here by very encouraging empirical work based on popular benchmark real dataset
This article is devoted to the problem of predicting the value taken by a random permutation Σ, describing the preferences of an individual over a set of numbered items {1, . . . , n} say, based on the observation of an input/explanatory r.v. X (e.g. characteristics of the individual), when error is measured by the Kendall τ distance. In the probabilistic formulation of the 'Learning to Order' problem we propose, which extends the framework for statistical Kemeny ranking aggregation developped in Korba et al. (2017), this boils down to recovering conditional Kemeny medians of Σ given X from i.i.d. training examples (X 1 , Σ 1 ), . . . , (X N , Σ N ). For this reason, this statistical learning problem is referred to as ranking median regression here. Our contribution is twofold. We first propose a probabilistic theory of ranking median regression: the set of optimal elements is characterized, the performance of empirical risk minimizers is investigated in this context and situations where fast learning rates can be achieved are also exhibited. Next we introduce the concept of local consensus/median, in order to derive efficient methods for ranking median regression. The major advantage of this local learning approach lies in its close connection with the widely studied Kemeny aggregation problem. From an algorithmic perspective, this permits to build predictive rules for ranking median regression by implementing efficient techniques for (approximate) Kemeny median computations at a local level in a tractable manner. In particular, versions of k-nearest neighbor and tree-based methods, tailored to ranking median regression, are investigated. Accuracy of piecewise constant ranking median regression rules is studied under a specific smoothness assumption for Σ's conditional distribution given X. The results of various numerical experiments are also displayed for illustration purpose.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.