Data mining is a useful decision support technique, which can be used to find trends and regularities in warehouses of corporate data. A serious problem of its practical applications is long processing time required by data mining algorithms. Current systems consume minutes or hours to answer simple queries. In this paper we present the concept of materialized data mining views. Materialized data mining views store selected patterns discovered in a portion of a database, and are used for query rewriting, which transforms a data mining query into a query accessing a materialized view. Since the transformation is transparent to a user, materialized data mining views can be created and used like indexes.
Abstract. Data clustering methods have many applications in the area of data mining. Traditional clustering algorithms deal with quantitative or categorical data points. However, there exist many important databases that store categorical data sequences, where significant knowledge is hidden behind sequential dependencies between the data. In this paper we introduce a problem of clustering categorical data sequences and present an efficient scalable algorithm to solve the problem. Our algorithm implements the general idea of agglomerative hierarchical clustering and uses frequently occurring subsequences as features describing data sequences. The algorithm not only discovers a set of high quality clusters containing similar data sequences but also provides descriptions of the discovered clusters.
Abstract. Many data mining techniques consist in discovering patterns frequently occurring in the source dataset. Typically, the goal is to discover all the patterns whose frequency in the dataset exceeds a userspecified threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper, we focus on improving the efficiency of constraint-based frequent pattern mining by using dataset filtering techniques. Dataset filtering conceptually transforms a given data mining task into an equivalent one operating on a smaller dataset. We present transformation rules for various classes of patterns: itemsets, association rules, and sequential patterns, and discuss implementation issues regarding integration of dataset filtering with well-known pattern discovery algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.