Semi-Supervised Clustering with Neural Networks

Shukla, Ankita; Cheema, Gullal S.; Anand, Sandeep

doi:10.48550/arxiv.1806.01547

Cited by 5 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors of [27] used a KL-divergence-based loss to train a DNN to predict cluster distribution from pairwise relations; one limitation of that method is its inability to use unlabeled data. Other works [28,29,30] used autoencoders with reconstruction losses to exploit inner characteristics of unlabeled data.…”

Section: Related Workmentioning

confidence: 99%

A classification-based approach to semi-supervised clustering with pairwise constraints

2020

View full text Add to dashboard Cite

In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method.The proposed approach, S 3 C 2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method.

show abstract

Section: Related Workmentioning

confidence: 99%

A classification-based approach to semi-supervised clustering with pairwise constraints

2020

View full text Add to dashboard Cite

show abstract

“…Representation Learning & Clustering Our approach relies on estimating unknown subclass labels by clustering a feature representation of the data. Techniques for learning semantically useful image features include autoencoder-based methods [32,46], the use of unsupervised auxiliary tasks [2,9], and pretraining on massive datasets [27]. Such features may be used for unsupervised identification of classes, either using clustering techniques [6] or an end-to-end approach [25,18].…”

Section: Related Workmentioning

confidence: 99%

“…This has been an area of substantial recent activity in machine learning, and has provided several important conclusions upon which we build in our work. The work of [32] and [46], for instance, demonstrate the utility of a simple autoencoded representation for performing unsupervised clustering in the feature space of a trained model. While the purpose of these works is often to show that deep clustering can be competitive with semi-supervised learning techniques, the mechanics of clustering in model feature space explored by these works are important for our present study.…”

Section: Neural Representation Clusteringmentioning

confidence: 99%

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

Sohoni¹,

Dunnmon²,

Angus³

et al. 2020

Preprint

View full text Add to dashboard Cite

In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has important consequences for models deployed in safety-critical applications such as medicine. We propose George, a method to both measure and mitigate hidden stratification even when subclass labels are unknown. We first observe that unlabeled subclasses are often separable in the feature space of deep models, and exploit this fact to estimate subclass labels for the training data via clustering techniques. We then use these approximate subclass labels as a form of noisy supervision in a distributionally robust optimization objective. We theoretically characterize the performance of George in terms of the worst-case generalization error across any subclass. We empirically validate George on a mix of real-world and benchmark image classification datasets, and show that our approach boosts worst-case subclass accuracy by up to 22 percentage points compared to standard training techniques, without requiring any information about the subclasses.

show abstract

“…Clustering The problem of clustering can be broadly defined as a label assignment task where data points with similar features are to be assigned the same label. Recently, deep neural networks have been utilized to perform this task in a supervised or semi-supervised [25,33] and unsupervised [4,23,37] setting. Similar to our work, some prior research also explores a voting mechanism [3,9,15,25] for clustering.…”

Section: Related Workmentioning

confidence: 99%

Scan2Plan: Efficient Floorplan Generation from 3D Scans of Indoor Scenes

Phalak¹,

Badrinarayanan²,

Rabinovich³

2020

Preprint

View full text Add to dashboard Cite

We introduce Scan2Plan, a novel approach for accurate estimation of a floorplan from a 3D scan of the structural elements of indoor environments. The proposed method incorporates a two-stage approach where the initial stage clusters an unordered point cloud representation of the scene into room instances and wall instances using a deep neural network based voting approach. The subsequent stage estimates a closed perimeter, parameterized by a simple polygon, for each individual room by finding the shortest path along the predicted room and wall keypoints. The final floorplan is simply an assembly of all such room perimeters in the global co-ordinate system. The Scan2Plan pipeline produces accurate floorplans for complex layouts, is highly parallelizable and extremely efficient compared to existing methods. The voting module is trained only on synthetic data and evaluated on publicly available Structured3D and BKE datasets to demonstrate excellent qualitative and quantitative results outperforming state-of-the-art techniques.

show abstract

Semi-Supervised Clustering with Neural Networks

Cited by 5 publications

References 15 publications

A classification-based approach to semi-supervised clustering with pairwise constraints

A classification-based approach to semi-supervised clustering with pairwise constraints

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

Scan2Plan: Efficient Floorplan Generation from 3D Scans of Indoor Scenes

Contact Info

Product

Resources

About