Towards Minimal Necessary Data: The Case for Analyzing Training Data Requirements
            of Recommender Algorithms

Larson, Martha; Zito, Alessandro; Loni, Babak; Cremonesi, Paolo

doi:10.18122/b2vx12

Cited by 8 publications

(10 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors suggests a differential data analysis for understanding which data contributes to performance in recommender systems, and propose that less useful data should be discarded based on the analysis. While we fully share their motivation and view that performance saturates with data size (as empirically confirmed in [19]), we like to highlight the post-hoc nature of their analysis. The choice of which particular data should be collected and eventually discarded is made after the data has been analysed.…”

Section: Background and Related Workmentioning

confidence: 76%

“…We see this for all studied markets. This is on a par with the saturation effect reported in [19]. However, while they conclude a decline of accuracy with the squared error metric on training data, we look at validation performance, with metrics purposely designed for measuring quality of recommendations.…”

Section: Performance Of Size Of Training Datamentioning

confidence: 87%

“…Minimal necessary data In a recent paper [19], the authors advocate a set of best-practice principles of minimal necessary data and training-data requirements analysis. We follow their motivation, which builds on the assumption that data-greed is also a (bad) habit in the development of recommender systems.…”

Section: Background and Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Data Optimisation for a Deep Learning Recommender System

Hertz,

Sachidanandan,

Tóth

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper advocates privacy preserving requirements on collection of user data for recommender systems. The purpose of our study is twofold. First, we ask if restrictions on data collection will hurt test quality of RNN-based recommendations. We study how validation performance depends on the available amount of training data. We use a combination of top-K accuracy, catalog coverage and novelty for this purpose, since good recommendations for the user is not necessarily captured by a traditional accuracy metric. Second, we ask if we can improve the quality under minimal data by using secondary data sources. We propose knowledge transfer for this purpose and construct a representation to measure similarities between purchase behaviour in data. This to make qualified judgements of which source domain will contribute the most. Our results show that (i) there is a saturation in test performance when training size is increased above a critical point. We also discuss the interplay between different performance metrics, and properties of data. Moreover, we demonstrate that (ii) our representation is meaningful for measuring purchase behaviour. In particular, results show that we can leverage secondary data to improve validation performance if we select a relevant source domain according to our similarly measure.

show abstract

Section: Background and Related Workmentioning

confidence: 76%

Section: Performance Of Size Of Training Datamentioning

confidence: 87%

See 1 more Smart Citation

Data Optimisation for a Deep Learning Recommender System

Hertz,

Sachidanandan,

Tóth

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Here, we zero in specifically on data minimization for recommender systems. In [23], the authors proposed to adopt training data requirements analysis to analyze and evaluate the trade-off between the amount of data that the system requires, and the performance of the system. In [21], the authors proposed to extend the data minimzation principles advocated in GDPR and studied their effect on recommender systems.…”

Section: Data Minimizationmentioning

confidence: 99%

Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems

Slokom,

Larson

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present a case that the newly emerging field of synthetic data in the area of recommender systems should prioritize 'doing data right'. We consider this catchphrase to have two aspects: First, we should not repeat the mistakes of the past, and, second, we should explore the full scope of opportunities presented by synthetic data as we move into the future. We argue that explicit attention to dataset design and description will help to avoid past mistakes with dataset bias and evaluation. In order to fully exploit the opportunities of synthetic data, we point out that researchers can investigate new areas such as using data synthesize to support reproducibility by making data open, as well as FAIR, and to push forward our understanding of data minimization.

show abstract

“…Recently, the area of recommender systems has seen growing interest in data minimization. For example, in [51] data requirements analysis is proposed as best practice. The argument this work advances is straightforward: just as we avoid developing algorithms with unnecessary computational complexity, we should also avoid developing algorithms that need unnecessary data.…”

Section: Datamentioning

confidence: 99%

Privacy and Audiovisual Content: Protecting Users as Big Multimedia Data Grows Bigger

Larson¹,

Choi²,

Slokom³

et al. 2019

Big Data Analytics for Large‐Scale Multimedia Search

Self Cite

View full text Add to dashboard Cite

Article 25fa pilot End User AgreementThis publication is distributed under the terms of Article 25fa of the Dutch Copyright Act (Auteurswet) with explicit consent by the author. Dutch law entitles the maker of a short scientific work funded either wholly or partially by Dutch public funds to make that work publicly available for no consideration following a reasonable period of time after the work was first published, provided that clear reference is made to the source of the first publication of the work.This publication is distributed under The Association of Universities in the Netherlands (VSNU) 'Article 25fa implementation' pilot project. In this pilot research outputs of researchers employed by Dutch Universities that comply with the legal requirements of Article 25fa of the Dutch Copyright Act are distributed online and free of cost or other barriers in institutional repositories. Research outputs are distributed six months after their first online publication in the original published version and with proper attribution to the source of the original publication.You are permitted to download and use the publication for personal purposes. All rights remain with the author(s) and/or copyrights owner(s) of this work. Any use of the publication other than authorised under this licence or copyright law is prohibited.If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website.

show abstract

Towards Minimal Necessary Data: The Case for Analyzing Training Data Requirements of Recommender Algorithms

Cited by 8 publications

References 24 publications

Data Optimisation for a Deep Learning Recommender System

Data Optimisation for a Deep Learning Recommender System

Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems

Privacy and Audiovisual Content: Protecting Users as Big Multimedia Data Grows Bigger

Contact Info

Product

Resources

About