2018
DOI: 10.48550/arxiv.1805.11191
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

Abstract: Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges respectively. A special class of subset selection functions naturally model notions of diversity, coverage and representation and they can be used to eliminate redund… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 36 publications
1
6
0
Order By: Relevance
“…Choosing data points farthest to the cluster centers leads to the worst performance. This is consistent with previous findings (Kaushal et al, 2018;Birodkar et al, 2019) that data points farthest from cluster centers are usually outliers and less representative. Selecting them might mislead the model to capture non-generic patterns and thereby generalize poorly.…”
Section: Variance Of Model Performance Table 1 Alsosupporting
confidence: 93%
See 1 more Smart Citation
“…Choosing data points farthest to the cluster centers leads to the worst performance. This is consistent with previous findings (Kaushal et al, 2018;Birodkar et al, 2019) that data points farthest from cluster centers are usually outliers and less representative. Selecting them might mislead the model to capture non-generic patterns and thereby generalize poorly.…”
Section: Variance Of Model Performance Table 1 Alsosupporting
confidence: 93%
“…This naturally matches the objectives of k-means clustering which minimizes the within-cluster variances while maximizing the between-cluster variances to encourage the diversity and representativeness of each cluster (Krishna and Murty, 1999;Kanungo et al, 2002). As has been shown in image classification tasks, data points closer to the cluster centroids are usually most important, while other faraway points can even be safely removed without hurting model performance (Kaushal et al, 2018;Birodkar et al, 2019). Inspired by this, we propose a simple selection strategy which first clusters the whole unlabeled dataset with the K-means algorithm, and then from each cluster, selects the data point that is closest to the cluster centroid.…”
Section: Introductionmentioning
confidence: 96%
“…[ Wei et al, 2015] pose the problem of subset selection as a constrained sub-modular maximization problem and use it to propose an active learning algorithm. The proposed techniques are used by [Kaushal et al, 2018] in the context of image recognition tasks. These drawback however, is that when used with deep-neural networks, simple uncertainty based strategies out-perform the mentioned algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…We train a 32-layer ResNet for the CIFAR-10 and CIFAR-100 [Krizhevsky and Hinton, 2009] datasets. The semantic representation obtained was a 64dimensional vector.…”
Section: Cifar-10 and Cifar-100mentioning
confidence: 99%
See 1 more Smart Citation