Accelerated Hierarchical Density Based Clustering

McInnes, Leland; Healy, John J.

doi:10.1109/icdmw.2017.12

Cited by 322 publications

(225 citation statements)

References 50 publications

Supporting

Mentioning

199

Contrasting

Unclassified

Order By: Relevance

“…The gene-frequencyby-inverse cell-frequency matrix was further reduced to a gene-by-context matrix by using a tdistributed stochastic neighbour embedding (t-SNE) (Maaten & Hinton, 2008) to reduce the cosine distance of the first 50 eigenvectors of the gene-frequency-by-inverse cell-frequency matrix (acquired using singular value decomposition). This reduced dimensionality gene-bycontext matrix was then clustered using HDBSCAN (McInnes & Healy, 2017) to spatially select clusters based on density in the reduced dimensionality representation. This has the benefit of identifying the sets of genes that are repeatedly observed together in the same context (subsets of cells), while simultaneously attenuating the signal from frequently expressed genes unless accompanied by a drastic change in expression level.…”

Section: Analysis Of Single-cell Rna-sequencing Resultsmentioning

confidence: 99%

Biologically indeterminate yet ordered promiscuous gene expression in single medullary thymic epithelial cells

Dhalla

Baran-Gale

Maio

et al. 2019

Preprint

View full text Add to dashboard Cite

During thymic negative selection, medullary thymic epithelial cells (mTEC) collectively express most protein coding genes, a process termed promiscuous gene expression (PGE). Although PGE is crucial for inducing central T-cell tolerance, this process has not been established definitively as being either stochastic or coordinated. To resolve this question, we sequenced the transcriptomes of 6,894 single mTEC, including 1,795 rare cells expressing either of two tissue-restricted antigens, TSPAN8 or GP2. Transcriptional heterogeneity allowed partitioning of mTEC into 15 robustly-defined subpopulations representing distinct maturational stages and subtypes. Although 50 gene co-expression groups were robustly identified, few could be explained by chromosomal location, biological pathway, or tissue specificity. Further, GP2+ mTEC were randomly dispersed spatially within medullary islands. Thus although PGE exhibits ordered co-expression, biologically it is indeterminate. This likely enhances the presentation of diverse antigens to passing thymocytes during their medullary residency, while simultaneously maintaining mTEC identity throughout PGE.

show abstract

Section: Analysis Of Single-cell Rna-sequencing Resultsmentioning

confidence: 99%

Biologically indeterminate yet ordered promiscuous gene expression in single medullary thymic epithelial cells

Dhalla

Baran-Gale

Maio

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…For structure identification problems in the third stage, the central task is to automatically combine feature vectors of the same kind into a single group. Cluster analysis encompasses different algorithms involving K‐means, hierarchical clustering or density‐based spatial clustering of applications with noise (DBSCAN) . The performance of each approach relies on the quality of the input feature vectors retrieved in the second stage.…”

Section: Structure Identification Via Machine Learningmentioning

confidence: 99%

“…Cluster analysis encompasses different algorithms involving K-means, 107 hierarchical clustering or density-based spatial clustering of applications with noise (DBSCAN). [108][109][110] The performance of each approach relies on the quality of the input feature vectors retrieved in the second stage. In the following subsection, we will discuss the advantages and disadvantages of various clustering methods employed in different scenarios.…”

Section: Structure Identification Via Machine Learningmentioning

confidence: 99%

A machine perspective of atomic defects in scanning transmission electron microscopy

Dan

Zhao

Pennycook

2019

InfoMat

View full text Add to dashboard Cite

Enabled by the advances in aberration‐corrected scanning transmission electron microscopy (STEM), atomic‐resolution real space imaging of materials has allowed a direct structure‐property investigation. Traditional ways of quantitative data analysis suffer from low yield and poor accuracy. New ideas in the field of computer vision and machine learning have provided more momentum to harness the wealth of big data and sophisticated information in STEM data analytics, which has transformed STEM from a localized characterization technique to a macroscopic tool with intelligence. In this review article, we discuss the prime significance of defect topology and density in two‐dimensional (2D) materials, which have proved to be a powerful means to tune a wide range of properties. Subsequently, we systematically review advanced data analysis methods that have demonstrated promising prospects in analyzing STEM data, particularly for identifying structural defects, with high throughput and veracity. A unified framework for atomic structure identification is also summarized.

show abstract

“…Example 3. The "hierarchical" version HDBSCAN* [1], [4] of DBSCAN* is the production (and interpretation) of a tree that is associated to the string of functions π 0 L s0,k (X) → π 0 L s1,k (X) → · · · → π 0 L sp,k (X) = *…”

Section: Contentsmentioning

confidence: 99%

“…Scoring appears as an analytic device in statistical approaches to clustering -see [4], for example. An alternative combinatorial method of scoring is presented in the third section of this paper.…”

Section: Introductionmentioning

confidence: 99%