Marc Boullé scite author profile

Abstract. Collaborative filtering aims at helping users find items they should appreciate from huge catalogues. In that field, we can distinguish user-based, item-based and model-based approaches. For each of them, many options play a crucial role for their performances, and in particular the similarity function defined between users or items, the number of neighbors considered for user-or item-based approaches, the number of clusters for model-based approaches using clustering, and the prediction function used. In this paper, we review the main collaborative filtering methods proposed in the litterature and compare them on the same widely used real dataset called MovieLens, and using the same widely used performance measure called Mean Absolute Error (MAE). This study thus allows us to highlight the advantages and drawbacks of each approach, and to propose some default options that we think should be used when using a given approach or designing a new one.

show abstract

MODL: A Bayes optimal discretization method for continuous attributes

Boullé

2006

Mach Learn

123

101

View full text Add to dashboard Cite

While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete data. Efficient discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. In this paper, we propose a new discretization method MODL 1 , founded on a Bayesian approach. We introduce a space of discretization models and a prior distribution defined on this model space. This results in the definition of a Bayes optimal evaluation criterion of discretizations. We then propose a new super-linear optimization algorithm that manages to find near-optimal discretizations. Extensive comparative experiments both on real and synthetic data demonstrate the high inductive performances obtained by the new discretization method.

show abstract

Analysis of the AutoML Challenge Series 2015–2018

Guyon

Sun-Hosoya

Boullé

et al. 2019

View full text Add to dashboard Cite

show abstract

Regularization and Averaging of the Selective Naïve Bayes classifier

Boullé¹

View full text Add to dashboard Cite

Khiops: A Statistical Discretization Method of Continuous Attributes

Boullé

2004

Machine Learning

View full text Add to dashboard Cite

Abstract. In supervised machine learning, some algorithms are restricted to discrete data and have to discretize continuous attributes. Many discretization methods, based on statistical criteria, information content, or other specialized criteria, have been studied in the past. In this paper, we propose the discretization method Khiops, 1 based on the chi-square statistic. In contrast with related methods ChiMerge and ChiSplit, this method optimizes the chisquare criterion in a global manner on the whole discretization domain and does not require any stopping criterion. A theoretical study followed by experiments demonstrates the robustness and the good predictive performance of the method.

show abstract

A user parameter-free approach for mining robust sequential classification rules

et al. 2016

View full text Add to dashboard Cite

Compact Mathematical Formulation for Graph Partitioning

Boullé

2004

Optimization and Engineering

View full text Add to dashboard Cite

A Triclustering Approach for Time Evolving Graphs

Guigourès

Boullé

Rossi

2012

View full text Add to dashboard Cite

Abstract-This paper introduces a novel technique to track structures in time evolving graphs. The method is based on a parameter free approach for three-dimensional co-clustering of the source vertices, the target vertices and the time. All these features are simultaneously segmented in order to build time segments and clusters of vertices whose edge distributions are similar and evolve in the same way over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make an a priori discretization. Experiments conducted on a synthetic dataset illustrate the good behaviour of the technique, and a study of a real-life dataset shows the potential of the proposed approach for exploratory data analysis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marc Boullé

Comparing State-of-the-Art Collaborative Filtering Systems

MODL: A Bayes optimal discretization method for continuous attributes

Analysis of the AutoML Challenge Series 2015–2018

Regularization and Averaging of the Selective Naïve Bayes classifier

Khiops: A Statistical Discretization Method of Continuous Attributes

A user parameter-free approach for mining robust sequential classification rules

Compact Mathematical Formulation for Graph Partitioning

A Triclustering Approach for Time Evolving Graphs

Contact Info

Product

Resources

About