The Chosen Few: On Identifying Valuable Patterns

Bringmann, Björn; Zimmermann, Albrecht

doi:10.1109/icdm.2007.85

Cited by 65 publications

(66 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even when using condensed representations (Mannila and Toivonen 1996;Pasquier et al 1999) or some form of pattern set selection (Bringmann and Zimmermann 2007;Knobbe and Ho 2006b;Peng et al 2005) as a post-processing step, the end result may still be unrealistically large, and represent tiny details of the data overly specifically. The experienced user of discovery algorithms will recognise the large level of redundancy that is common in the final pattern set.…”

Section: Introductionmentioning

confidence: 99%

“…Due to the above-mentioned risk of redundancy with top-k selection, the level of exploration within a beam can become limited, which will adversely affect the quality of the end result. Inspiration for selecting a diverse collection of patterns for the beam at each search level will come from pattern set selection techniques (Bringmann and Zimmermann 2007;Knobbe and Ho 2006b;Peng et al 2005), which were originally designed for post-processing the end-result of discovery algorithms. …”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Diverse subgroup set discovery

Leeuwen

Knobbe

2012

Data Min Knowl Disc

View full text Add to dashboard Cite

Large data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly redundant result sets, while ignoring many potentially interesting results. These problems are particularly apparent with subgroup discovery (SD) and its generalisation, exceptional model mining. To address this, we introduce subgroup set discovery: one should not consider individual subgroups, but sets of subgroups. We consider three degrees of redundancy, and propose corresponding heuristic selection strategies in order to eliminate redundancy. By incorporating these (generic) subgroup selection methods in a beam search, the aim is to improve the balance between exploration and exploitation. The proposed algorithm, dubbed DSSD for diverse subgroup set discovery, is experimentally evaluated and compared to existing approaches. For this, a variety of target types with corresponding datasets and quality measures is used. The subgroup sets that are discovered by the competing methods are evaluated primarily on the following three criteria: (1) diversity in the subgroup covers (exploration), (2) the maximum quality found (exploitation), and (3) runtime. The results show that DSSD outperforms each traditional SD method on all or a (non-empty) subset of these criteria, depending on the specific setting. The more complex the task, the larger the benefit of using our diverse heuristic search turns out to be.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Diverse subgroup set discovery

Leeuwen

Knobbe

2012

Data Min Knowl Disc

View full text Add to dashboard Cite

show abstract

“…However, there is no efficient way to generate a set of patterns that can satisfy global constraints (e.g. cover all data while giving good prediction accuracy) [3] and all of the mentioned methods require costly or ad-hoc post-processing stages for selecting the patterns. In contrast, the proposed histogram of pattern sets does capture the discriminative power of the whole set of patterns and provides an efficient image representation for supervised tasks.…”

Section: Related Workmentioning

confidence: 99%

Histograms of Pattern Sets for Image Classification and Object Recognition

Voravuthikunchai

Crémilleux

Jurie

2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

This paper introduces a novel image representation capturing feature dependencies through the mining of meaningful combinations of visual features. This representation leads to a compact and discriminative encoding of images that can be used for image classification, object detection or object recognition. The method relies on (i) multiple random projections of the input space followed by local binarization of projected histograms encoded as sets of items, and (ii) the representation of images as Histograms of Pattern Sets (HoPS). The approach is validated on four publicly available datasets (Daimler Pedestrian, Oxford Flowers, KTH Texture and PASCAL VOC2007), allowing comparisons with many recent approaches. The proposed image representation reaches state-of-the-art performance on each one of these datasets.

show abstract

“…For instance, CBA [1] first computes all frequent itemsets (with their most frequent class label) and then induces an ordered rule-list classifier by removing redundant itemsets. Several alternative techniques (for instance, [28,30]) define measures of redundancy and ways to select only a limited number of patterns. Constructing a concise pattern set for use in classification can be seen as a form of feature selection.…”

Section: Global Heuristic Two Step Techniquesmentioning

confidence: 99%

“…While [9] used an exhaustive two step approach to finding pattern sets, there are numerous heuristic approaches to finding global pattern sets that first perform a local pattern mining step and then heuristically post-process the result ( [1,28]) (see [29] for an overview). Thus the second step does not guarantee that the optimal solutions are found.…”

Section: Global Heuristic Two Step Techniquesmentioning

confidence: 99%

k-Pattern Set Mining under Constraints

Guns

Nijssen

Raedt

2013

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

We introduce the k-pattern set mining problem, which is concerned with finding sets of k patterns that satisfy constraints. We formulate a number of such constraints, both at the local level, that is, on individual patterns, and more importantly, also on the global level, that is, on the overall pattern set. The resulting framework is flexible and generic in the sense that it can be instantiated to a wide variety of well-known mining tasks including concept-learning, rule-learning, redescription mining, conceptual clustering and tiling. We present a solution method based on constraint programming and discuss how many problems can been modelled in a constraint programming system. Finally, a number of experiments show the promise and generality of the approach.

show abstract

The Chosen Few: On Identifying Valuable Patterns

Abstract: Abstract

Cited by 65 publications

References 9 publications

Diverse subgroup set discovery

Diverse subgroup set discovery

Histograms of Pattern Sets for Image Classification and Object Recognition

k-Pattern Set Mining under Constraints

Contact Info

Product

Resources

About