Semi-supervised learning is drawing increasing attention in the era of
big data, as the gap between the abundance of cheap, automatically collected
unlabeled data and the scarcity of labeled data that are laborious and expensive to
obtain is dramatically increasing. In this paper, we first introduce a unified view
of density-based clustering algorithms. We then build upon this view and bridge the
areas of semi-supervised clustering and classification under a common umbrella of
density-based techniques. We show that there are close relations between
density-based clustering algorithms and the graph-based approach for transductive
classification. These relations are then used as a basis for a new framework for
semi-supervised classification based on building-blocks from density-based
clustering. This framework is not only efficient and effective, but it is also
statistically sound. In addition, we generalize the core algorithm in our framework,
HDBSCAN*, so that it can also perform semi-supervised clustering by directly taking
advantage of any fraction of labeled data that may be available. Experimental
results on a large collection of datasets show the advantages of the proposed
approach both for semi-supervised classification as well as for semi-supervised
clustering.