2000
DOI: 10.1007/978-1-4615-5521-6
|View full text |Cite
|
Sign up to set email alerts
|

Mining Very Large Databases with Parallel Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
67
0
12

Year Published

2002
2002
2019
2019

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 83 publications
(81 citation statements)
references
References 0 publications
0
67
0
12
Order By: Relevance
“…The point is that the processing time taken per small disjunct is relatively short even when using a genetic algorithm, since there are just a few examples in the training set of a small disjunct. Finally, if necessary the processing time taken by all the c * d GA runs can be considerably reduced by using parallel processing techniques [5]. Actually, our method greatly facilitates the exploitation of parallelism in the discovery of small disjunct rules, since each GA run is completely independent from the others and it needs to have access only to a small data set, which surely can be kept in the local memory of a simple processor node.…”
Section: Computational Resultsmentioning
confidence: 99%
“…The point is that the processing time taken per small disjunct is relatively short even when using a genetic algorithm, since there are just a few examples in the training set of a small disjunct. Finally, if necessary the processing time taken by all the c * d GA runs can be considerably reduced by using parallel processing techniques [5]. Actually, our method greatly facilitates the exploitation of parallelism in the discovery of small disjunct rules, since each GA run is completely independent from the others and it needs to have access only to a small data set, which surely can be kept in the local memory of a simple processor node.…”
Section: Computational Resultsmentioning
confidence: 99%
“…One possibility is to use parallel processing techniques, since EAs can be easily parallelized in an effective way (Cantu-Paz 2000; Freitas & Lavington 1998;Freitas 2002a). Another possibility is to compute the fitness of individuals by using only a subset of training instances -where that subset can be chosen either at random or using adaptive instance-selection techniques (Bhattacharyya 1998;Gathercole & Ross 1997;Sharpe & Glover 1999;Freitas 2002a).…”
Section: Discussionmentioning
confidence: 99%
“…Dividing data by features [6] requires the workers to coordinate which input data instance falls into which tree-node. This requires additional communication, which we try to avoid as we scale to very large data sets.…”
Section: Related Workmentioning
confidence: 99%
“…It builds the robust regression tree on the master by exactly calculating the robust loss functions in a distributed way. SRT: It refers to the distributed regression tree based on square error criteria [17] in Apache Spark machine learning tool set 6 . Prior to the tree induction, a pre-processing step is performed to obtain static and equidepth histograms for each feature and the split points are constantly selected from the bins of such histograms in the training phase.…”
Section: Setupmentioning
confidence: 99%