1999
DOI: 10.1007/3-540-46846-3_16
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Abstract: Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several researchers, random sampling is difficult to use due to the difficulty of determining an appropriate sample size. In t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
57
0

Year Published

2001
2001
2014
2014

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 57 publications
(57 citation statements)
references
References 12 publications
0
57
0
Order By: Relevance
“…However, they tested this method with only one candidate itemset. Thus, as suggested in [8,9], more works need to be done to combine this adaptive sampling method with some mining algorithms, such as Apriori, and more experiments are required to test the cost of on-line sampling.…”
Section: Sequential Sequence Mining Algorithmsmentioning
confidence: 99%
See 3 more Smart Citations
“…However, they tested this method with only one candidate itemset. Thus, as suggested in [8,9], more works need to be done to combine this adaptive sampling method with some mining algorithms, such as Apriori, and more experiments are required to test the cost of on-line sampling.…”
Section: Sequential Sequence Mining Algorithmsmentioning
confidence: 99%
“…MSPX can avoid or alleviate some problems inherent in the traditional single-sample methods [5,8,9,12,17,20,21]: (1) The performance of single-sample methods usually varies considerably from one run to another for the same mining task, because a bad sample can degrade the overall performance of the mining. On the other hand, by using multiple samples, MSPX effectively prevents the candidate generation from the overestimates made by a bad sample, so its performance is much more stable.…”
Section: Sampling In Mspxmentioning
confidence: 99%
See 2 more Smart Citations
“…A variety of procedures for selecting subsets from a large dataset are studied in [6]; and the results of using different techniques are empirically compared. A sequential sampling method for determining appropriate sample sizes for data reduction is proposed in [7]. The afore-mentioned data reduction approaches are mainly based on statistical sampling techniques, such as simple random sampling, stratified sampling or cluster sampling.…”
Section: Data Reduction: An Overviewmentioning
confidence: 99%