2006
DOI: 10.1007/11871637_6
|View full text |Cite
|
Sign up to set email alerts
|

SD-Map – A Fast Algorithm for Exhaustive Subgroup Discovery

Abstract: Abstract. In this paper we present the novel SD-Map algorithm for exhaustive but efficient subgroup discovery. SD-Map guarantees to identify all interesting subgroup patterns contained in a data set, in contrast to heuristic or samplingbased methods. The SD-Map algorithm utilizes the well-known FP-growth method for mining association rules with adaptations for the subgroup discovery task. We show how SD-Map can handle missing values, and provide an experimental evaluation of the performance of the algorithm us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
110
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
6
3

Relationship

3
6

Authors

Journals

citations
Cited by 114 publications
(110 citation statements)
references
References 11 publications
(17 reference statements)
0
110
0
Order By: Relevance
“…Such an extended quality function could be defined as q a (sd) = |ext(sd)| a · (t − t 0 ) · |u(sd)|, where |u(sd)| is the user count for images in the respective subgroup. Unfortunately, such interestingness measures are not supported by efficient exhaustive algorithms for subgroup discovery, e.g., SD-Map [10] or BSD [11]. On the other hand, more basic algorithms, for example exhaustive depth-first search without a specialized data structure scale not very well for the problem setting of this paper, with thousands of tags as descriptions and possibly millions of instances in an interactive setting.…”
Section: Avoiding User Bias: User-resource Weightingmentioning
confidence: 99%
“…Such an extended quality function could be defined as q a (sd) = |ext(sd)| a · (t − t 0 ) · |u(sd)|, where |u(sd)| is the user count for images in the respective subgroup. Unfortunately, such interestingness measures are not supported by efficient exhaustive algorithms for subgroup discovery, e.g., SD-Map [10] or BSD [11]. On the other hand, more basic algorithms, for example exhaustive depth-first search without a specialized data structure scale not very well for the problem setting of this paper, with thousands of tags as descriptions and possibly millions of instances in an interactive setting.…”
Section: Avoiding User Bias: User-resource Weightingmentioning
confidence: 99%
“…The Dpsubgroup algorithm is only beaten by Algorithm 1 for sufficiently large differences in the search space. This behavior is due to the sophisticated data structures (fptrees [2,11]) Dpsubgroup uses in contrast to our algorithm. A further noteworthy fact is that unless Algorithm 1 ran out of memory (the oom entries) it always outperforms LCM/greedy.…”
Section: Empirical Evaluationmentioning
confidence: 99%
“…Subgroup discovery [2,12,17] is a local pattern discovery task: descriptions of subpopulations of a database are evaluated against some real-valued quality function, and those descriptions exceeding some given minimum quality are returned to the user. The quality functions commonly used in this course like PiatetskyShapiro, binomial test, or Gini-index (see [12] for a list) are functions of the extension of a subgroup description.…”
Section: Introductionmentioning
confidence: 99%
“…In 29 , a recent review describing the SD task, the quality measures used, the approaches and the applications can be found. The SD task is somehow between descriptive and predictive induction, and different algorithms adapting classical algorithms of both classification -as CN2-SD 38 -and association rule learning -as Apriori-SD 33 or SD-MAP 8 -have been proposed. Nowadays, one of the most important aspect in SD is the measures to be used to evaluate the quality of the subgroups extracted.…”
Section: Introductionmentioning
confidence: 99%