Time dependency of data quality for collaborative filtering algorithms

Pessemier, Toon De; Dooms, Simon; Deryckere, Tom; Martens, Luc

doi:10.1145/1864708.1864767

Cited by 15 publications

(10 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One exception to this is the matrix factorization algorithm in the MT dataset. As already observed before in the literature [De Pessemier et al 2010;Campos et al 2011], recommendation performance increases in the MovieLens datasets when more ratings are available, but this is not the case with the MovieTweetings dataset, where all the recommenders, except the matrix factorization method, decrease or maintain their performance. A similar result was observed in [Said et al 2009], where a dataset with several new items was used, which lowered the recommendation precision.…”

Section: Benchmarking Resultsmentioning

confidence: 54%

A Framework for Dataset Benchmarking and Its Application to a New Movie Rating Dataset

Dooms

Bellogín

Pessemier

et al. 2016

ACM Trans. Intell. Syst. Technol.

Self Cite

View full text Add to dashboard Cite

Rating datasets are of paramount importance in recommender systems research. They serve as input for recommendation algorithms, as simulation data, or for evaluation purposes. In the past, public accessible rating datasets were not abundantly available, leaving researchers no choice but to work with old and static datasets like MovieLens and Netflix. More recently, however, emerging trends as social media and smartphones are found to provide rich data sources which can be turned into valuable research datasets. While dataset availability is growing, a structured way for introducing and comparing new datasets is currently still lacking. In this work, we propose a five-step framework to introduce and benchmark new datasets in the recommender systems domain. We illustrate our framework on a new movie rating dataset -called MovieTweetings -collected from Twitter. Following our framework, we detail the origin of the dataset, provide basic descriptive statistics, investigate external validity, report the results of a number of reproducible benchmarks, and conclude by discussing some interesting advantages and appropriate research use cases.

show abstract

Section: Benchmarking Resultsmentioning

confidence: 54%

A Framework for Dataset Benchmarking and Its Application to a New Movie Rating Dataset

Dooms

Bellogín

Pessemier

et al. 2016

ACM Trans. Intell. Syst. Technol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this paper, we adopt a testing methodology similar to the one adopted in [4]. For each dataset, all user historical selections are chronologically split into a training set and a testing set, with the most recent ones (10%) in testing set while the remaining (90%) as input data (Input data is not equivalent to the training set, as explained later on).…”

Section: Methodsmentioning

confidence: 99%

“…One related study on the impact of such timeliness of data (and the effect of inclusion of old data in particular) is the work of Pessemier et al [4]. Their work studied the impact of inclusion of older data on recommendation accuracy in neighbor-based CF algorithms.…”

Section: Introductionmentioning

confidence: 99%

Effectiveness of the data generated on different time in latent factor model

Zheng

2013

Proceedings of the 7th ACM Conference on Recommender Systems

View full text Add to dashboard Cite

User selection data accumulates as time goes by. Although the recent selections are usually assumed to have higher impact on the recommendation accuracy, empirical studies on this problem are limited. For old data, whether they can contribute to the recommendation accuracy is still to be determined. On one hand, changes in short-term user preference over time may limit their effectiveness in prediction, but on the other hand, one cannot rule out their potential in capturing long term user preferences. The result is important for the system owner to determine which data is useful to make the recommendation accurately. While there have been some related studies on the time dependency of data quality using neighbor-based CF methods (e.g., [4]), its effects remain unverified for other CF methods. In this paper, we study the effect of data generated over different time period on recommendation precision using several popular model-based CF algorithms (latent factor models). Experiment results show that while more recent data expectedly have larger impacts, the usefulness of older data cannot be ignored as long as there are sufficient old samples. However, the addition of insufficient amount of old data seems to have negative impacts.

show abstract

“…The proper functioning of a recommender system depends on the availability of consistent, correct, and comprehensive data sources [6]. Specific personal preferences and user constraints, which are characteristic for the domain of traveling, emphasize the importance of data quality.…”

Section: Data Structurementioning

confidence: 99%

Hybrid group recommendations for a travel service

Pessemier

Dhondt

Martens

2016

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

Recommendation techniques have proven their usefulness as a tool to cope with the information overload problem in many classical domains such as movies, books, and music. Additional challenges for recommender systems emerge in the domain of tourism such as acquiring metadata and feedback, the sparsity of the rating matrix, user constraints, and the fact that traveling is often a group activity. This paper proposes a recommender system that offers personalized recommendations for travel destinations to individuals and groups. These recommendations are based on the users' rating profile, personal interests, and specific demands for their next destination. The recommendation algorithm is a hybrid approach combining a content-based, collaborative filtering, and knowledge-based solution. For groups of users, such as families or friends, individual recommendations are aggregated into group recommendations, with an additional opportunity for users to give feedback on these group recommendations. A group of test users evaluated the recommender system using a prototype web application. The results prove the usefulness of individual and group recommendations and show that users prefer the hybrid algorithm over each individual technique. This paper demonstrates the added value of various recommendation algorithms in terms of different quality aspects, compared to an unpersonalized list of the most-popular destinations.

show abstract

Time dependency of data quality for collaborative filtering algorithms

Cited by 15 publications

References 4 publications

A Framework for Dataset Benchmarking and Its Application to a New Movie Rating Dataset

A Framework for Dataset Benchmarking and Its Application to a New Movie Rating Dataset

Effectiveness of the data generated on different time in latent factor model

Hybrid group recommendations for a travel service

Contact Info

Product

Resources

About