2020
DOI: 10.1145/3403951
|View full text |Cite
|
Sign up to set email alerts
|

Density-based Algorithms for Big Data Clustering Using MapReduce Framework

Abstract: Clustering is used to extract hidden patterns and similar groups from data. Therefore, clustering as a method of unsupervised learning is a crucial technique for big data analysis owing to the massive number of unlabeled objects involved. Density-based algorithms have attracted research interest, because they help to better understand complex patterns in spatial datasets that contain information about data related to co-located objects. Big data clustering is a challenging task, because the volume of data incr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 78 publications
0
7
0
Order By: Relevance
“…The HDFS is a scalable file system, has a high throughput, and can handle errors. Figure 1 shows the main tasks of MapReduce, which can be summed up in the following way [1], [10]: -Mappers process the input data, which is read from the HDFS file system, and produce output pairs of "key/value." -During the shuffling step, the outputs of the mappers are redirected to reducer nodes by the values of their respective "keys."…”
Section: Mapreduce Frameworkmentioning
confidence: 99%
“…The HDFS is a scalable file system, has a high throughput, and can handle errors. Figure 1 shows the main tasks of MapReduce, which can be summed up in the following way [1], [10]: -Mappers process the input data, which is read from the HDFS file system, and produce output pairs of "key/value." -During the shuffling step, the outputs of the mappers are redirected to reducer nodes by the values of their respective "keys."…”
Section: Mapreduce Frameworkmentioning
confidence: 99%
“…In ZTA, we believe that AI technologies are the primary solution for automating cyber threat intelligence collection. The clustering algorithms of unsupervised learning can group different patterns of threat intelligence according to their similarity [6,70]. The log-based anomaly detection method uses AI technologies to realize automatic log monitoring and anomaly identification.…”
Section: Threat Intelligence Collectionmentioning
confidence: 99%
“…These data mining techniques are also known as Knowledge Discovery in Database. The common ways used in mining data includes clustering [3], regression, classification, and association rule mining algorithms, among which, density-based clustering algorithm [4] as a branch of clustering is widely used in the geography, medicine, finance, and image analysis fields [5] because of its capability to discover clusters of arbitrary shapes in dense areas and handle noise or outliers effectively. Unfortunately, the huge volume of data generated in the traditional density-based clustering algorithms, can be too massive for a single machine to handle in a reasonable amount of time.…”
Section: Introductionmentioning
confidence: 99%