Mining association rules from large databases of business data is an important topic in data mining. In many applications, there are explicit or implicit taxonomies (hierarchies) for items, so it may be useful to find associations at levels of the taxonomy other than the primitive concept level. Previous work on the mining of generalized association rules, however, assumed that the taxonomy of items remained unchanged, disregarding the fact that the taxonomy might be updated as new transactions are added to the database over time. If this happens, effectively updating the generalized association rules to reflect the database change and related taxonomy evolution is a crucial task. In this paper, we examine this problem and propose two novel algorithms, called IDTE and IDTE2, which can incrementally update the generalized association rules when the taxonomy of items evolves as a result of new transactions. Empirical evaluations show that our algorithms can maintain their performance even for large numbers of incremental transactions and high degrees of taxonomy evolution, and are faster than applying contemporary generalized association mining algorithms to the whole updated database.
Mining generalized association rules among items in the presence of taxonomies has been recognized as an important model for data mining. Earlier work on mining generalized association rules, however, required the taxonomies to be static, ignoring the fact that the taxonomies of items cannot necessarily be kept unchanged. For instance, some items may be reclassified from one hierarchy tree to another for more suitable classification, abandoned from the taxonomies if they will no longer be produced, or added into the taxonomies as new items. Additionally, the analysts might have to dynamically adjust the taxonomies from different viewpoints so as to discover more informative rules. Under these circumstances, effectively updating the discovered generalized association rules is a crucial task. In this paper, we examine this problem and propose two novel algorithms, called Diff_ET and Diff_ET2, to update the discovered frequent itemsets. Empirical evaluation shows that the proposed algorithms are very effective and have good linear scale-up characteristics.2
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.