Abstract:The benefits of neural approaches are undisputed in many application areas. However, today's research practice in applied machine learning-where researchers often use a variety of baselines, datasets, and evaluation procedures-can make it difficult to understand how much progress is actually achieved through novel technical approaches. In this work, we focus on the fast-developing area of session-based recommendation and aim to contribute to a better understanding of what represents the state-of-the-art.To tha… Show more
“…Therefore, progress is often claimed by comparing a complex neural model against another neural model, which is, however, not necessarily a strong baseline. Similar observations can be made for the area of session-based recommendation, where a recent method based on recurrent neural networks [16] is considered a competitive baseline, even though almost trivial methods are in most cases better [29,30].…”
Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned nonneural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.
“…Therefore, progress is often claimed by comparing a complex neural model against another neural model, which is, however, not necessarily a strong baseline. Similar observations can be made for the area of session-based recommendation, where a recent method based on recurrent neural networks [16] is considered a competitive baseline, even though almost trivial methods are in most cases better [29,30].…”
Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned nonneural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.
“…Note that we only consider approaches that use a user-item rating matrix as an input. CNNs were also applied for session-based recommendation[46], where they however showed some limitations as well[29].…”
In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research. CCS CONCEPTS • Information systems → Recommender systems; • Computing methodologies → Neural networks.
“…Other measures such as the precision, recall, and nDCG may be used to evaluate the quality of recommendations. Recently, neural collaborative filtering has been proposed for recommendation [51,52]. It will be interesting to investigate the effectiveness of incorporating it in our system.…”
Collaborative filtering recommender systems traditionally recommend products to users solely based on the given user-item rating matrix. Two main issues, data sparsity and scalability, have long been concerns. In our previous work, an approach was proposed to address the scalability issue by clustering the products using the content of the user-item rating matrix. However, it still suffers from these concerns. In this paper, we improve the approach by employing user comments to address the issues of data sparsity and scalability. Word2Vec is applied to produce item vectors, one item vector for each product, from the comments made by users on their previously bought goods. Through the user-item rating matrix, the user vectors of all the customers are produced. By clustering, products and users are partitioned into item groups and user groups, respectively. Based on these groups, recommendations to a user can be made. Experimental results show that both the inaccuracy caused by a sparse user-item rating matrix and the inefficiency due to an enormous amount of data can be much alleviated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.