2016 IEEE International Conference on Big Data (Big Data) 2016
DOI: 10.1109/bigdata.2016.7841024
|View full text |Cite
|
Sign up to set email alerts
|

“Influence sketching”: Finding influential samples in large-scale regressions

Abstract: Abstract-There is an especially strong need in modern largescale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical statistical technique for identifying samples which unusually strongly impact the fit of a regression model (a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
11
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…To explain differentiable parametric predictive models, such as deep networks [19] and linear models [26,47], the gradients of the output with respect to the parameters and input data [39] can signify key factors that explain the output. However, graphical models aim to model long range and more complicated types of interaction among variables.…”
Section: Related Workmentioning
confidence: 99%
“…To explain differentiable parametric predictive models, such as deep networks [19] and linear models [26,47], the gradients of the output with respect to the parameters and input data [39] can signify key factors that explain the output. However, graphical models aim to model long range and more complicated types of interaction among variables.…”
Section: Related Workmentioning
confidence: 99%
“…First, the instance attribution methods find the training instances that significantly influence the prediction on an instance to be interpreted. Wojnowicz et al [45] used influence sketching to identify the training instances that strongly affect the fit of a regression model by efficiently estimating Cook's distance [9]. Koh et al [22] proposed influence functions to trace the prediction of a model and identify the training instances that are the most responsible for the prediction.…”
Section: Related Workmentioning
confidence: 99%
“…where Ω ∈ R n×k is a matrix of random numbers and c is a scalar for norm preservation that depends upon the random projection method used. 1,2 Random projection is a computationally cheap technique that approximately preserves pairwise distances between samples with high probability (with error depending on k and m). 1 Where possible, vectors in R m are denoted by u and vectors in R n are denoted by v. 2 Note that a random projection is technically not actually a projection (an endomorphism X P -X that satisfies P 2 = P ).…”
Section: A Random Projection (Rp)mentioning
confidence: 99%
“…Stochastic DR may be applied as a preprocessing step before feeding the data into a computationally expensive classifier, e.g. a neural network [1], or it may be directly embedded within algorithms to improve scalability [2].…”
mentioning
confidence: 99%