Correlation is usually used in the context of real-valued sequences. However, in data mining, the values range may be of various types-real, nominal or ordinal. Regardless of their type, the methods on measuring correlation between multivariable sequences of data are reviewed. In particular, a new method on measuring the statistical correlation of multivariable sequences is proposed. As the method relies on the geometrical meaning of dot conduct to get the degree of multivariable correlation, it is called M-correlation. M-correlation is used to cut redundancy association rules in this paper. In order to enhance mining efficiency, a novel algorithm, namely FT-Miner, is presented to find all frequent sub-trees in a forest, using two new data structures called UFP-Tree and FP-Forest. The experimentation shows that the algorithm not only reduces a lot of unavailable rules, but also has better capability than classical algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.