2019 IEEE International Conference on Big Data (Big Data) 2019
DOI: 10.1109/bigdata47090.2019.9006065
|View full text |Cite
|
Sign up to set email alerts
|

High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models

Abstract: Clustering is a data mining technique intensively used for data analytics, with applications to marketing, security, text/document analysis, or sciences like biology, astronomy, and many more. Dirichlet Process Mixture (DPM) is a model used for multivariate clustering with the advantage of discovering the number of clusters automatically and offering favorable characteristics. However, in the case of high dimensional data, it becomes an important challenge with numerical and theoretical pitfalls. The advantage… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…This is one of the keys in distributed data science systems supporting algorithms often originally designed for centralized environments. Our solutions rely on sufficient statistics, by means of synchronization between machines in terms of proposed clusters at each step [1,2]. We provide an interactive demonstration of these previous results.…”
Section: Introductionmentioning
confidence: 69%
See 2 more Smart Citations
“…This is one of the keys in distributed data science systems supporting algorithms often originally designed for centralized environments. Our solutions rely on sufficient statistics, by means of synchronization between machines in terms of proposed clusters at each step [1,2]. We provide an interactive demonstration of these previous results.…”
Section: Introductionmentioning
confidence: 69%
“…HD4C [2] presents a parallel clustering approach adapted for high dimensional data. Actually, DC-DPM is a solution proposed to this issue when data is multivariate.…”
Section: High Dimensional Data Distributed Dirichlet Clusteringmentioning
confidence: 99%
See 1 more Smart Citation
“…In this research, the clustering of ESG scores uses K-means clustering algorithm analysis with R Studio tools. K-means has been widely used in previous research because it has several advantages, including: (1) it considers a collection of observations (x1, x2,…, xn); and (2) simplicity, fast convergence, and good scalability (Kwedlo and Czochanski, 2019;Meguelati et al, 2019).…”
Section: Data Analysis: Esg Performance Assessment and Company Cluste...mentioning
confidence: 99%
“…Clustering of time-series data has been an active area of research over the last few decades and many good techniques have been developed (c.f. [2,11,17] for surveys and [13] for some recent work). The challenge in clustering the smart meter data stems from:…”
Section: Related Workmentioning
confidence: 99%