Daniel Paurat scite author profile

We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as controllability of pattern discovery processes. While previous sampling approaches mainly rely on theMarkov chainMonte Carlo method, our procedures are direct, i.e., non processsimulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared frequency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability. Copyright 2011 ACM

show abstract

Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space

Großkreutz

Paurat

2011

View full text Add to dashboard Cite

We consider a modified version of the top-k subgroup discovery task, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it has been applied in many applications, so far no efficient exact algorithm for this task has been proposed. Most existing solutions do not guarantee the exact solution (as a result of the use of non-admissible heuristics), while the only exact solution relies on the explicit storage of the whole search space, which results in prohibitively large memory requirements.In this paper, we present a new top-k relevant subgroup discovery algorithm which overcomes these shortcomings. Our solution is based on the fact that if an iterative deepening approach is applied, the relevance check -which is the root of the problems of all other approaches -can be realized based solely on the best k subgroups visited so far. The approach also allows for the integration of admissible pruning techniques like optimistic estimate pruning. The result is a fast, memory-efficient algorithm which clearly outperforms existing top-k relevant subgroup discovery approaches. Moreover, we analytically and empirically show that it is competitive with simpler approaches which do not consider the relevance criterion.

show abstract

InVis: A Tool for Interactive Visual Data Analysis

Paurat

Gärtner

2013

View full text Add to dashboard Cite

We present InVis, a tool to visually analyse data by interactively shaping a two dimensional embedding of it. Traditionally, embedding techniques focus on finding one fixed embedding, which emphasizes a single aspects of the data. In contrast, our application enables the user to explore the structures of a dataset by observing and controlling a projection of it. Ultimately it provides a way to search and find an embedding, emphasizing aspects that the user desires to highlight.

show abstract

Interactive Knowledge-Based Kernel PCA

Oglic

Paurat

Gärtner

2014

View full text Add to dashboard Cite

Abstract. Data understanding is an iterative process in which domain experts combine their knowledge with the data at hand to explore and confirm hypotheses. One important set of tools for exploring hypotheses about data are visualizations. Often, however, traditional, unsupervised dimensionality reduction algorithms are used for visualization. These tools allow for interaction, i.e., exploring different visualizations, only by means of manipulating some technical parameters of the algorithm. Therefore, instead of being able to intuitively interact with the visualization, domain experts have to learn and argue about these technical parameters. In this paper we propose a knowledge-based kernel PCA approach that allows for intuitive interaction with data visualizations. Each embedding direction is given by a non-convex quadratic optimization problem over an ellipsoid and has a globally optimal solution in the kernel feature space. A solution can be found in polynomial time using the algorithm presented in this paper. To facilitate direct feedback, i.e., updating the whole embedding with a sufficiently high frame-rate during interaction, we reduce the computational complexity further by incremental up-and down-dating. Our empirical evaluation demonstrates the flexibility and utility of this approach.

show abstract

An enhanced relevance criterion for more concise supervised pattern discovery

Großkreutz

Paurat

Rüping

2012

View full text Add to dashboard Cite

Supervised local pattern discovery aims to find subsets of a database with a high statistical unusualness in the distribution of a target attribute. Local pattern discovery is often used to generate a human-understandable representation of the most interesting dependencies in a data set. Hence, the more crisp and concise the output is, the better. Unfortunately, standard algorithm often produce very large and redundant outputs. In this paper, we introduce delta-relevance, a definition of a more strict criterion of relevance. It will allow us to significantly reduce the output space, while being able to guarantee that every local pattern has a delta-relevant representative which is almost as good in a clearly defined sense. We show empirically that delta-relevance leads to a considerable reduction of the amount of returned patterns. We also demonstrate that in a top-k setting, the removal of not delta-relevant patterns improves the quality of the result set

show abstract

Simple Recurrent Neural Networks for Support Vector Machine Training

Sifa

Paurat

Trabold

et al. 2018

View full text Add to dashboard Cite

Corresponding Projections for Orphan Screening

Giesselbach¹,

Ullrich²,

Kamp³

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel Paurat

Direct local pattern sampling by efficient two-step random procedures

Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space

InVis: A Tool for Interactive Visual Data Analysis

Interactive Knowledge-Based Kernel PCA

An enhanced relevance criterion for more concise supervised pattern discovery

Simple Recurrent Neural Networks for Support Vector Machine Training

Corresponding Projections for Orphan Screening

Contact Info

Product

Resources

About