Most of the existing association rule mining algorithms are able to extract knowledge from databases with attributes of binary values. However, in real-world applications, databases are usually composed of continuous values such as height, length or weight. If the attributes are continuous, the algorithms are commonly integrated with a discretization method that transforms them into discrete attributes. Discretization is a process of transforming a continuous attribute value into a finite number of intervals and assigning each interval into a discrete numerical value. However, the user most often must specify the number of intervals, or provide some heuristic rules to be used while discretization, and then it is difficult to get the highest attribute interdependency and at the same time get the lowest number of intervals. In this paper we present an association rule mining algorithm that is suited for continuous valued attributes commonly found in scientific and statistical databases. We propose a method using a new graph-based evolutionary algorithm named 'genetic network programming (GNP)' that can deal with continuous values directly, that is, without using any discretization method as a preprocessing step. GNP represents its individuals using graph structures and evolves them in order to find a solution; this feature contributes to creating very compact programs and implicitly memorizing past action sequences. In the proposed method using GNP, the significance of the extracted association rules is measured by the use of χ 2 test, and only important association rules are stored in a pool all together through generations. Results of experiments conducted on a real-life database suggest that the proposed method provides an effective technique for handling continuous attributes.
Among several methods of extracting association rules that have been reported, a new evolutionary method named Genetic Network Programming (GNP) has also shown its effectiveness for small databases in the sense that they have a relatively small number of attributes. However, this conventional GNP method is not be able to deal with large databases with a huge number of attributes, because its search space becomes very large, causing bad performance at running time. The aim of this paper is to propose a new method to extract association rules from large and dense databases with a huge amount of attributes through the combination of conventional GNP based mining method and a specially designed genetic algorithm (GA). Each of these evolutionary methods works in its own processing level and they are highly synchronized to act as one system.Our strategy consists in the division of a large and dense database into many small databases. These small databases are considered as individuals and form a population. Then the conventional GNP based mining method is applied to extract association rules for each of these individuals. Finally, the population is evolved through several generations using GA with special genetic operators considering the acquired information. Two complementary processing levels are defined: Global Level and Local Level, each with its own independent tasks and processes. In the Global Level mainly GA process is carried out, whereas in the Local Level, conventional GNP based mining method is carried out in parallel and they generate their own local pools of association rules. Several special genetic operations for GA in the Global Level are proposed and the performance of each of them and their combination is shown and compared.In our simulations, the conventional GNP based mining method and our proposed method are compared using a real world large and dense database with a huge amount of attributes. The results show that extending the conventional GNP based mining method using GA allows to extract association rules from large and dense databases directly and more efficiently than the conventional GNP method.
The initiative of combining association rule mining with fuzzy set theory has been applied frequently in recent years [1][2][3][4][5]. The original idea comes from dealing with quantitative attributes in a database, where discretization of the quantitative values into intervals would lead to under or overestimation of the values that are near the borders. This is called the sharp boundary problem. Fuzzy sets can help us to overcome this problem by allowing different degrees of the membership, not only 1 and 0 treated by traditional methods. Attribute values can thereby be the members of more than one set and therefore give a more realistic view on such data. On the other hand, fuzzy set theory has been shown to be a very useful tool in association rule mining, because the mined rules can be expressed in linguistic terms, which are more natural and understandable for human beings. The linguistic representation is mainly useful when those discovered rules are presented to human experts for study. In this paper, a novel association rule mining approach that integrates the evolutionary optimization technique 'genetic network programming (GNP)' and fuzzy set theory has been proposed for mining interesting fuzzy rules from given quantitative data. The performance of our algorithm has been compared with other relevant algorithms and the experimental results show the advantages and effectiveness of the proposed model.
During the last years, several association rule-based classification methods have been proposed, these algorithms may quickly generate accurate rules. However, the generated rules are often very large in terms of the number of rules and usually complex and hardly understandable for users. Among all the rules generated by the algorithms, only some of them are likely to be of any interest to the domain expert analyzing the data. Most of the rules are either redundant, irrelevant or obvious. In this paper, a new method for selecting the interesting class association rules is proposed by an evolutionary method named genetic relation algorithm. The algorithm evaluates the relevance and interestingness of the discovered association rules by the relationships between the rules in each generation using a specific measure of distance among them giving a reduced set of rules as the result in the final generation. This small rule set has the following properties: (i) accurate as it has at least the same classification accuracy as the complete association rule set, (ii) interesting because of the diversity of rules and (iii) comprehensible because it is more understandable for the users as the number of attributes involved in the rules is also small. The efficiency of the proposed method is compared with other conventional methods including genetic network programming-based mining using ten databases and the experimental results show that it outperforms others keeping a good balance between the classification accuracy and the comprehensibility of the rules.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.