2021
DOI: 10.1007/978-3-030-86520-7_41
|View full text |Cite
|
Sign up to set email alerts
|

Finding High-Value Training Data Subset Through Differentiable Convex Programming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 5 publications
0
5
0
Order By: Relevance
“…Therefore, our framework shares the same spirit as the traditional label efficiency research. Data valuation In the literature, other than active learning, there exists many techniques to quantify the importance of individual samples, e.g., influence function [Koh and Liang 2017] and its variants [Wu, Weimer, and Davidson 2021], Glister [Killamsetty et al 2021], HOST-CP [Das et al 2021], TracIn [Pruthi et al 2020], DVRL [Yoon, Arik, and Pfister 2020] and Data Shapley value [Ghorbani and Zou 2019]. However, among these methods, Data Shapley value [Ghorbani and Zou 2019] is very computationally expensive while others rely on the assumption that a set of "clean" validation samples (or meta samples) are given, which is thus not suitable for our framework (we have more detailed discussions on Data Shapley value and its extensions in Appendix "Appendix: more related work").…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, our framework shares the same spirit as the traditional label efficiency research. Data valuation In the literature, other than active learning, there exists many techniques to quantify the importance of individual samples, e.g., influence function [Koh and Liang 2017] and its variants [Wu, Weimer, and Davidson 2021], Glister [Killamsetty et al 2021], HOST-CP [Das et al 2021], TracIn [Pruthi et al 2020], DVRL [Yoon, Arik, and Pfister 2020] and Data Shapley value [Ghorbani and Zou 2019]. However, among these methods, Data Shapley value [Ghorbani and Zou 2019] is very computationally expensive while others rely on the assumption that a set of "clean" validation samples (or meta samples) are given, which is thus not suitable for our framework (we have more detailed discussions on Data Shapley value and its extensions in Appendix "Appendix: more related work").…”
Section: Related Workmentioning
confidence: 99%
“…It finds its application in subset selection, providing explanations to predictions, diagnosing mislabelled examples and so on. There have been several works in the literature of data valuation encompassing influence functions [10], shapley values [7], reinforcement learning [23], differentiable convex programming [5], tracking training trajectories [14] and more. However, apart from [10] and [14], all the other above-mentioned techniques are expensive to be scaled on large datasets and models since they merge the training and scoring datapoints in a combined framework.…”
Section: Related Workmentioning
confidence: 99%
“…Finding "influential" datapoints in a training dataset, also known as data valuation [10,14] and datasubset selection [5], has emerged as an important sub-problem for many modern deep learning application domains, e.g. Data-centric AI [7], Explainability in trusted AI [15], [2], debugging the training process [10], scalable supervised deep learning [9].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations