Querying multiple sets of discovered rules

Tuzhilin, Alexander; Liu, Bing

doi:10.1145/775047.775055

Cited by 35 publications

(15 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Winarko and Roddick [23] use signature files to index temporal patterns. Tuzhilin et al [20] use B+ trees to index the support and the confidence of the rules and use inverted files to index patterns. In Opportunity Map [12], a hash-tree is used to store classification rules.…”

Section: Related Workmentioning

confidence: 99%

“…Only a few papers address this problem [16,1,20,15], and the tech-niques proposed in these papers have not been comprehensively compared. In this paper, we study the performance of three structures for indexing and querying frequent itemsets.…”

Section: Introductionmentioning

confidence: 99%

“…Two index structures, inverted files and signature files, are two classical structures for indexing set-valued data. They are employed for indexing frequent itemsets in [16,20]. We make some modifications to the two structures to make them more suitable for indexing frequent itemsets.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A performance study of three disk-based structures for indexing and querying frequent itemsets

2013

View full text Add to dashboard Cite

Frequent itemset mining is an important problem in the data mining area. Extensive efforts have been devoted to developing efficient algorithms for mining frequent itemsets. However, not much attention is paid on managing the large collection of frequent itemsets produced by these algorithms for subsequent analysis and for user exploration. In this paper, we study three structures for indexing and querying frequent itemsets: inverted files, signature files and CFPtree. The first two structures have been widely used for indexing general set-valued data. We make some modifications to make them more suitable for indexing frequent itemsets. The CFP-tree structure is specially designed for storing frequent itemsets. We add a pruning technique based on length-2 frequent itemsets to make it more efficient for processing superset queries. We study the performance of the three structures in supporting five types of containment queries: exact match, subset/superset search and immediate subset/superset search. Our results show that no structure can outperform other structures for all the five types of queries on all the datasets. CFP-tree shows better overall performance than the other two structures.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A performance study of three disk-based structures for indexing and querying frequent itemsets

2013

View full text Add to dashboard Cite

show abstract

“…In this domain, few works have been done nowadays. The most significant one is probably RULE-QL (Tuzhilin and Liu, 2002) which proposes an extension of SQL allowing some kind of management for association rules. Nevertheless, such a language only offers some very basic functionalities such as accessing some parts of rules, searching for rules containing a particular item in their left or right part, etc.…”

Section: Introductionmentioning

confidence: 99%

Efficient Management of Non Redundant Rules in Large Pattern Bases: A Bitmap Approach

Jacquenet

Largeron

Udréa

2006

Proceedings of the Eighth International Conference on Enterprise Information Systems

View full text Add to dashboard Cite

Abstract:Knowledge Discovery from Databases has more and more impact nowadays and various tools are now available to extract efficiently (in time and memory space) some knowledge from huge databases. Nevertheless, those systems generally produce some large pattern bases and then the management of these one rapidly becomes untractable. Few works have focused on pattern base management systems and researches on that domain are really new. This paper comes within that context, dealing with a particular class of patterns that is association rules. More precisely, we present the way we have efficiently implemented the search for non redundant rules thanks to a representation of rules in the form of bitmap arrays. Some experiments show that the use of this technique increases dramatically the gain in time and space, allowing us to manage large pattern bases.

show abstract

“…A solution can come from query languages dedicated to pattern database manipulations. It is the case of RULE-QL (Tuzhilin and Liu, 2002) which extends SQL with operators allowing to access rules components and to specify subset relationships. It is thus easier to write queries that, for instance, select rules that have a left part contained in the consequent of another rule.…”

Section: A Critical Evaluationmentioning

confidence: 99%

Data Mining Query Languages

Boulicaut¹,

Masson²

Data Mining and Knowledge Discovery Handbook

View full text Add to dashboard Cite

Summary. Many Data Mining algorithms enable to extract different types of patterns from data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and data. The inductive database approach has emerged as an unifying framework for such systems. Following this database perspective, knowledge discovery processes become querying processes for which query languages have to be designed. In the prolific field of association rule mining, different proposals of query languages have been made to support the more or less declarative specification of both data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.Key words: Query languages, Association Rules, Inductive Databases. The Need for Data Mining Query LanguagesSince the first definition of the Knowledge Discovery in Databases (KDD) domain in (Piatetsky-Shapiro and Frawley, 1991), many techniques have been proposed to support these "From Data to Knowledge" complex interactive and iterative processes. In practice, knowledge elicitation is based on some extracted and materialized (collections of) patterns which can be global (e.g., decision trees) or local (e.g., itemsets, association rules). Real life KDD processes imply complex pre-processing manipulations (e.g., to clean the data), several extraction steps with different parameters and types of patterns (e.g., feature construction by means of constrained itemsets followed by a classifying phase, association rule mining for different thresholds values and different objective measures of interestingness), and post-processing manipulations (e.g., elimination of redundancy in extracted patterns, crossing-over operations between patterns and data like the search of transactions which are exceptions to frequent and valid association rules or the selection of misclassified examples with a decision tree). Looking for a tighter integration between data and patterns which hold in the data, Imielinski and Mannila have proposed in (Imielinski and Mannila, 1996) the concept of inductive database (IDB). In an IDB, ordinary queries can be used to access and manipulate data, while inductive queries can be used to generate (mine), manipulate, and apply patterns. KDD becomes O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,

show abstract

Querying multiple sets of discovered rules

Cited by 35 publications

References 16 publications

A performance study of three disk-based structures for indexing and querying frequent itemsets

A performance study of three disk-based structures for indexing and querying frequent itemsets

Efficient Management of Non Redundant Rules in Large Pattern Bases: A Bitmap Approach

Data Mining Query Languages

Contact Info

Product

Resources

About