Rémi Viola scite author profile

Rémi Viola

2Publications

2Citation Statements Received

28Citation Statements Given

How they've been cited

How they cite others

Affiliations

Ministère de l'Économie et des Finances, Laboratoire Hubert Curien, Claude Bernard University Lyon 1

Publications

Order By: Most citations

An Adjusted Nearest Neighbor Algorithm Maximizing the F-Measure from Imbalanced Data

Viola¹,

Emonet²,

Habrard

et al. 2019

View full text Add to dashboard Cite

In this paper, we address the challenging problem of learning from imbalanced data using a Nearest-Neighbor (NN) algorithm. In this setting, the minority examples typically belong to the class of interest requiring the optimization of specific criteria, like the F-Measure. Based on simple geometrical ideas, we introduce an algorithm that reweights the distance between a query sample and any positive training example. This leads to a modification of the Voronoi regions and thus of the decision boundaries of the NN algorithm. We provide a theoretical justification about the weighting scheme needed to reduce the False Negative rate while controlling the number of False Positives. We perform an extensive experimental study on many public imbalanced datasets, but also on large scale non public data from the French Ministry of Economy and Finance on a tax fraud detection task, showing that our method is very effective and, interestingly, yields the best performance when combined with state of the art sampling methods.

show abstract

Learning from Few Positives: a Provably Accurate Metric Learning Algorithm to Deal with Imbalanced Data

Viola

Emonet

Habrard

et al. 2020

View full text Add to dashboard Cite

Learning from imbalanced data, where the positive examples are very scarce, remains a challenging task from both a theoretical and algorithmic perspective. In this paper, we address this problem using a metric learning strategy. Unlike the state-of-the-art methods, our algorithm MLFP, for Metric Learning from Few Positives, learns a new representation that is used only when a test query is compared to a minority training example. From a geometric perspective, it artificially brings positive examples closer to the query without changing the distances to the negative (majority class) data. This strategy allows us to expand the decision boundaries around the positives, yielding a better F-Measure, a criterion which is suited to deal with imbalanced scenarios. Beyond the algorithmic contribution provided by MLFP, our paper presents generalization guarantees on the false positive and false negative rates. Extensive experiments conducted on several imbalanced datasets show the effectiveness of our method.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rémi Viola

An Adjusted Nearest Neighbor Algorithm Maximizing the F-Measure from Imbalanced Data

Learning from Few Positives: a Provably Accurate Metric Learning Algorithm to Deal with Imbalanced Data

Contact Info

Product

Resources

About