2017
DOI: 10.1002/widm.1216
|View full text |Cite
|
Sign up to set email alerts
|

Data mining in distributed environment: a survey

Abstract: Due to the rapid growth of resource sharing, distributed systems are developed, which can be used to utilize the computations. Data mining (DM) provides powerful techniques for finding meaningful and useful information from a very large amount of data, and has a wide range of real‐world applications. However, traditional DM algorithms assume that the data is centrally collected, memory‐resident, and static. It is challenging to manage the large‐scale data and process them with very limited resources. For examp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
1

Relationship

4
3

Authors

Journals

citations
Cited by 121 publications
(57 citation statements)
references
References 135 publications
(278 reference statements)
0
50
0
2
Order By: Relevance
“…Pattern (i.e., itemset, rule, and sequence) mining [20,33] is a kind of well-studied data mining and analytics model. The applications of pattern mining models are very extensive, and details can be referred to in the survey literature [11,15,18,19]. A great effort has been put forth by the data mining community to discover frequent patterns from itemset-based data, such as Apriori [3] and FP-growth [20] methods.…”
Section: Frequency-based Mining On Sequencesmentioning
confidence: 99%
See 1 more Smart Citation
“…Pattern (i.e., itemset, rule, and sequence) mining [20,33] is a kind of well-studied data mining and analytics model. The applications of pattern mining models are very extensive, and details can be referred to in the survey literature [11,15,18,19]. A great effort has been put forth by the data mining community to discover frequent patterns from itemset-based data, such as Apriori [3] and FP-growth [20] methods.…”
Section: Frequency-based Mining On Sequencesmentioning
confidence: 99%
“…Knowing the useful patterns and auxiliary knowledge from sequences/events can benefit a number of applications, such as web access analysis, event prediction, time-aware recommendation, and DNA detection [11]. Up to now, research has been conducted on mining interesting patterns from transaction or sequential data [11,15,20,33]. However, most of them are based on the co-occurrence frequency of patterns.…”
Section: Introductionmentioning
confidence: 99%
“…There are some research opportunities for iHUIM to handle large‐scale databases: how to design a parallelized iHUIM algorithm, how to develop a iHUIM algorithm based on the existing big data technologies (e.g., MapReduce (Dean & Ghemawat, ), Spark (Zaharia, Chowdhury, Das, Dave, & Ma, )). Besides, other promising areas can be considered such as designing parallel, distributed, multicore, and GPU‐based algorithms (Gan, Lin, Chao, & Zhan, ) for iHUIM.…”
Section: Opportunities For Ihuimmentioning
confidence: 99%
“…By enhancement of the intermediate result storage with in-memory computations and generalization of the MapReduce pattern with a more flexible directed acyclic graph (DAG), SPARK has gained a high popularity (Landset, Khoshgoftaar, Richter, & Hasanin, 2015). Especially the in-memory processing of big data is a key technology for fast and responsive mining and can be found in many commercial products like SAP HANA or SAS products (Gan, Lin, Chao, & Zhan, 2017). For machine learning SPARK offers its own machine learning library MLlib but the open source aspect also allows the development of third party frameworks like H2O.…”
Section: Historical Development and State-of-the-artmentioning
confidence: 99%