2017 IEEE International Conference on Data Mining Workshops (ICDMW) 2017
DOI: 10.1109/icdmw.2017.12
|View full text |Cite
|
Sign up to set email alerts
|

Accelerated Hierarchical Density Based Clustering

Abstract: We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter . This makes accelerated HDBSCAN* the default choice for density based clustering. arXiv:1705.07321v2… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
199
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 322 publications
(225 citation statements)
references
References 50 publications
2
199
0
1
Order By: Relevance
“…The gene-frequencyby-inverse cell-frequency matrix was further reduced to a gene-by-context matrix by using a tdistributed stochastic neighbour embedding (t-SNE) (Maaten & Hinton, 2008) to reduce the cosine distance of the first 50 eigenvectors of the gene-frequency-by-inverse cell-frequency matrix (acquired using singular value decomposition). This reduced dimensionality gene-bycontext matrix was then clustered using HDBSCAN (McInnes & Healy, 2017) to spatially select clusters based on density in the reduced dimensionality representation. This has the benefit of identifying the sets of genes that are repeatedly observed together in the same context (subsets of cells), while simultaneously attenuating the signal from frequently expressed genes unless accompanied by a drastic change in expression level.…”
Section: Analysis Of Single-cell Rna-sequencing Resultsmentioning
confidence: 99%
“…The gene-frequencyby-inverse cell-frequency matrix was further reduced to a gene-by-context matrix by using a tdistributed stochastic neighbour embedding (t-SNE) (Maaten & Hinton, 2008) to reduce the cosine distance of the first 50 eigenvectors of the gene-frequency-by-inverse cell-frequency matrix (acquired using singular value decomposition). This reduced dimensionality gene-bycontext matrix was then clustered using HDBSCAN (McInnes & Healy, 2017) to spatially select clusters based on density in the reduced dimensionality representation. This has the benefit of identifying the sets of genes that are repeatedly observed together in the same context (subsets of cells), while simultaneously attenuating the signal from frequently expressed genes unless accompanied by a drastic change in expression level.…”
Section: Analysis Of Single-cell Rna-sequencing Resultsmentioning
confidence: 99%
“…For structure identification problems in the third stage, the central task is to automatically combine feature vectors of the same kind into a single group. Cluster analysis encompasses different algorithms involving K‐means, hierarchical clustering or density‐based spatial clustering of applications with noise (DBSCAN) . The performance of each approach relies on the quality of the input feature vectors retrieved in the second stage.…”
Section: Structure Identification Via Machine Learningmentioning
confidence: 99%
“…Cluster analysis encompasses different algorithms involving K-means, 107 hierarchical clustering or density-based spatial clustering of applications with noise (DBSCAN). [108][109][110] The performance of each approach relies on the quality of the input feature vectors retrieved in the second stage. In the following subsection, we will discuss the advantages and disadvantages of various clustering methods employed in different scenarios.…”
Section: Structure Identification Via Machine Learningmentioning
confidence: 99%
“…Example 3. The "hierarchical" version HDBSCAN* [1], [4] of DBSCAN* is the production (and interpretation) of a tree that is associated to the string of functions π 0 L s0,k (X) → π 0 L s1,k (X) → · · · → π 0 L sp,k (X) = *…”
Section: Contentsmentioning
confidence: 99%
“…Scoring appears as an analytic device in statistical approaches to clustering -see [4], for example. An alternative combinatorial method of scoring is presented in the third section of this paper.…”
Section: Introductionmentioning
confidence: 99%