Diverse subgroup set discovery

Leeuwen, Matthijs van; Knobbe, Arno

doi:10.1007/s10618-012-0273-y

Cited by 92 publications

(96 citation statements)

References 40 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[14,16]. The aim is to find groups of objects, called subgroups, for which the distribution over the labels is statistically different from that of the entire set of objects.…”

Section: Subgroup Discoverymentioning

confidence: 99%

Local Subgroup Discovery for Eliciting and Understanding New Structure-Odor Relationships

et al. 2016

View full text Add to dashboard Cite

Abstract. From a molecule to the brain perception, olfaction is a complex phenomenon that remains to be fully understood in neuroscience. A challenge is to establish comprehensive rules between the physicochemical properties of the molecules (e.g., weight, atom counts) and specific and small subsets of olfactory qualities (e.g., fruity, woody). This problem is particularly difficult as the current knowledge states that molecular properties only account for 30% of the identity of an odor: predictive models are found lacking in providing universal rules. However, descriptive approaches enable to elicit local hypotheses, validated by domain experts, to understand the olfactory percept. Based on a new quality measure tailored for multi-labeled data with skewed distributions, our approach extracts the top-k unredundant subgroups interpreted as descriptive rules description → {subset of labels}. Our experiments on benchmark and olfaction datasets demonstrate the capabilities of our approach with direct applications for the perfume and flavor industries.

show abstract

“…[14,16]. The aim is to find groups of objects, called subgroups, for which the distribution over the labels is statistically different from that of the entire set of objects.…”

Section: Subgroup Discoverymentioning

confidence: 99%

Local Subgroup Discovery for Eliciting and Understanding New Structure-Odor Relationships

et al. 2016

View full text Add to dashboard Cite

show abstract

“…IDSD builds upon Diverse Subgroup Set Discovery (DSSD) [25]. DSSD was proposed in an attempt to eliminate redundancy by using a diverse beam search.…”

Section: Integrating Interaction Into Searchmentioning

confidence: 99%

“…For the setting without interaction, DSSD [25] was used with its default parameter settings (Table 1(a)). The results suffer from two severe problems.…”

Section: Case Study: Sports Analyticsmentioning

confidence: 99%

Interactive Data Exploration Using Pattern Mining

Leeuwen

2014

Interactive Knowledge Discovery and Data Mining in Biomedical Informatics

Self Cite

View full text Add to dashboard Cite

Abstract. We live in the era of data and need tools to discover valuable information in large amounts of data. The goal of exploratory data mining is to provide as much insight in given data as possible. Within this field, pattern set mining aims at revealing structure in the form of sets of patterns. Although pattern set mining has shown to be an effective solution to the infamous pattern explosion, important challenges remain. One of the key challenges is to develop principled methods that allow user-and task-specific information to be taken into account, by directly involving the user in the discovery process. This way, the resulting patterns will be more relevant and interesting to the user. To achieve this, pattern mining algorithms will need to be combined with techniques from both visualisation and human-computer interaction. Another challenge is to establish techniques that perform well under constrained resources, as existing methods are usually computationally intensive. Consequently, they are only applied to relatively small datasets and on fast computers. The ultimate goal is to make pattern mining practically more useful, by enabling the user to interactively explore the data and identify interesting structure. In this paper we describe the state-of-the-art, discuss open problems, and outline promising future directions.

show abstract

“…Recent works [12,9,17] propose approaches to solve the redundancy and trivial patterns issues in frequent pattern mining. For instance, the first two works focuse on finding a relevant or concise representation of sets of frequent patterns.…”

Section: Related Workmentioning

confidence: 99%

Mining Top-K Largest Tiles in a Data Stream

Lam

Pei

Prado

et al. 2014

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Large tiles in a database are itemsets with the largest area which is defined as the itemset frequency in the database multiplied by its size. Mining these large tiles is an important pattern mining problem since tiles with a large area describe a large part of the database. In this paper, we introduce the problem of mining top-k largest tiles in a data stream under the sliding window model. We propose a candidatebased approach which summarizes the data stream and produces the top-k largest tiles efficiently for moderate window size. We also propose an approximation algorithm with theoretical bounds on the error rate to cope with large size windows. In the experiments with two real-life datasets, the approximation algorithm is up to hundred times faster than the candidate-based solution and the baseline algorithms based on the state-of-the-art solutions. We also investigate an application of large tile mining in computer vision and in emerging search topics monitoring.

show abstract

Diverse subgroup set discovery

Cited by 92 publications

References 40 publications

Local Subgroup Discovery for Eliciting and Understanding New Structure-Odor Relationships

Local Subgroup Discovery for Eliciting and Understanding New Structure-Odor Relationships

Interactive Data Exploration Using Pattern Mining

Mining Top-K Largest Tiles in a Data Stream

Contact Info

Product

Resources

About