Do Offline Metrics Predict Online Performance in Recommender Systems?

Krauth, Karl; Dean, Sarah; Na, Xiaona; Guo, Wenhui; Curmei, Mihaela; Recht, Benjamin; Jordan, Michael I.

doi:10.48550/arxiv.2011.07931

Cited by 9 publications

(18 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Preference models We consider two preference models: one based on matrix factorization (MF) as well as a neighborhood based model (KNN). We use the LibFM SGD implementation [Rendle, 2012] for the MF model and use the item-based k-nearest neighbors model implemented by Krauth et al [2020]. For each dataset and recommender model we perform hyper-parameter tuning using a 10%-90% test-train split.…”

Section: Methodsmentioning

confidence: 99%

“…Empirical studies of human behavior find mixed results on the relationship between recommendation and content diversity [Nguyen et al, 2014, Flaxman et al, 2016. Simulation studies [Chaney et al, 2018, Yao et al, 2021, Krauth et al, 2020 and theoretical investigations [Dandekar et al, 2013] shed light on phenomena in simplified settings, showing how homogenization, popularity bias, performance, and polarization depend on assumed user behavior models. Even ensuring accuracy in sequential dynamic settings requires contending with closed-loop behaviors.…”

Section: Related Workmentioning

confidence: 99%

“…MovieLens 1 Million ML-1M dataset was downloaded from Group Lens 2 via the RecLab [Krauth et al, 2020] interface 3 . It contains 1 through 5 rating data of 6040 users for 3706 movies.…”

Section: B Datasets Model Training and Computing Infrastructure B1 De...mentioning

confidence: 99%

See 2 more Smart Citations

Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability

Curmei,

Dean,

Recht

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In this work, we consider how preference models in interactive recommendation systems determine the availability of content and users' opportunities for discovery. We propose an evaluation procedure based on stochastic reachability to quantify the maximum probability of recommending a target piece of content to an user for a set of allowable strategic modifications. This framework allows us to compute an upper bound on the likelihood of recommendation with minimal assumptions about user behavior. Stochastic reachability can be used to detect biases in the availability of content and diagnose limitations in the opportunities for discovery granted to users. We show that this metric can be computed efficiently as a convex program for a variety of practical settings, and further argue that reachability is not inherently at odds with accuracy. We demonstrate evaluations of recommendation algorithms trained on large datasets of explicit and implicit ratings. Our results illustrate how preference models, selection rules, and user interventions impact reachability and how these effects can be distributed unevenly.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability

Curmei,

Dean,

Recht

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…We generate the synthetic dataset using a modified version of the latent-static environment from the RecLab simulation platform [20].…”

Section: Empirical Setting and Methodsmentioning

confidence: 99%

The Stereotyping Problem in Collaboratively Filtered Recommender Systems

Guo¹,

Krauth²,

Jordan³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recommender systems -and especially matrix factorization-based collaborative filtering algorithms -play a crucial role in mediating our access to online information. We show that such algorithms induce a particular kind of stereotyping: if preferences for a set of items are anticorrelated in the general user population, then those items may not be recommended together to a user, regardless of that user's preferences and ratings history. First, we introduce a notion of joint accessibility, which measures the extent to which a set of items can jointly be accessed by users. We then study joint accessibility under the standard factorization-based collaborative filtering framework, and provide theoretical necessary and sufficient conditions when joint accessibility is violated. Moreover, we show that these conditions can easily be violated when the users are represented by a single feature vector. To improve joint accessibility, we further propose an alternative modelling fix, which is designed to capture the diverse multiple interests of each user using a multi -vector representation. We conduct extensive experiments on real and simulated datasets, demonstrating the stereotyping problem with standard single-vector matrix factorization models.

show abstract

“…Given the usefulness of simulations, many simulation frameworks have been developed to study various fairness approaches for information retrieval systems; just to mention a few: MARS-Gym [139], ML-fairness-gym [41], Accordion [108], RecLab [90], RecSim NG [115], SIREN [23], T-RECS [105], RecoGym [137], AESim [59], Virtual-Taobao [146].…”

Section: Simulation and Applied Modeling To Study Long-term Effects A...mentioning

confidence: 99%

Fair ranking: a critical review, challenges, and future directions

Patro¹,

Porcaro²,

Mitchell³

et al. 2022

Preprint

View full text Add to dashboard Cite

Ranking, recommendation, and retrieval systems are widely used in online platforms and other societal systems, including e-commerce, media-streaming, admissions, gig platforms, and hiring. In the recent past, a large "fair ranking" research literature has been developed around making these systems fair to the individuals, providers, or content that are being ranked. Most of this literature defines fairness for a single instance of retrieval, or as a simple additive notion for multiple instances of retrievals over time. This work provides a critical overview of this literature, detailing the often context-specific concerns that such an approach misses: the gap between high ranking placements and true provider utility, spillovers and compounding effects over time, induced strategic incentives, and the effect of statistical uncertainty. We then provide a path forward for a more holistic and impact-oriented fair ranking research agenda, including methodological lessons from other fields and the role of the broader stakeholder community in overcoming data bottlenecks and designing effective regulatory environments.

show abstract

Do Offline Metrics Predict Online Performance in Recommender Systems?

Cited by 9 publications

References 38 publications

Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability

Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability

The Stereotyping Problem in Collaboratively Filtered Recommender Systems

Fair ranking: a critical review, challenges, and future directions

Contact Info

Product

Resources

About