2019
DOI: 10.1007/978-3-030-13453-2_4
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

Abstract: Machine learning models are increasingly used in the industry to make decisions such as credit insurance approval. Some people may be tempted to manipulate specific variables, such as the age or the salary, in order to get better chances of approval. In this ongoing work, we propose to discuss, with a first proposition, the issue of detecting a potential local adversarial example on classical tabular data by providing to a human expert the locally critical features for the classifier's decision, in order to co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 6 publications
0
3
0
Order By: Relevance
“…DP 1 states it is essential to enable fraud experts with the capabilities and limitations of AI. Probabilities provide an explicit visualization of AI models' confidence for predictions based on a trained dataset [47,32]. Moreover, the method of Adversarial explanations enables experts to assess the cases that would affect legitimate predictions.…”
Section: Quality Of Design Principles Instantiation Through a Simulationmentioning
confidence: 99%
“…DP 1 states it is essential to enable fraud experts with the capabilities and limitations of AI. Probabilities provide an explicit visualization of AI models' confidence for predictions based on a trained dataset [47,32]. Moreover, the method of Adversarial explanations enables experts to assess the cases that would affect legitimate predictions.…”
Section: Quality Of Design Principles Instantiation Through a Simulationmentioning
confidence: 99%
“…Namely, what we propose to call decision boundary centered explanations. While LIME illustrates which features contribute to an instance, Local Adversarial Detection (LAD) [4] and Local Surrogate [3] yield a feature attribution that is relevant at a local decision boundary. To do so, it is required to find the decision boundary first and then to train a surrogate on instances located around the decision boundary.…”
Section: Counterfactual Explanationmentioning
confidence: 99%
“…As mentioned in Section 2.2, this can be formulated as an optimization problem. Alternatively, we can use random approaches similar to [1] or [4]. Randomly sampling instances in a neighbourhood of the instance x 0 can be very expensive as the counterfactual might be far away in the feature space of possible instances.…”
Section: Phase 1: Finding the First Counterfactualmentioning
confidence: 99%