Abstract-This paper considers the automatic design of fuzzyrule-based classification systems from labeled data. The performance of classifiers and the interpretability of generated rules are of major importance in these systems. In past research, some genetic-based algorithms have been used for the rule learning process. These genetic fuzzy systems have utilized different approaches to encode rules. In this paper, we have proposed a novel steadystate genetic algorithm to extract a compact set of good fuzzy rules from numerical data (SGERD). The selection mechanism of this algorithm is nonrandom, and only the best individuals can survive. Our approach is very simple and fast, and can be applied to high-dimensional problems with numerical attributes. To select the rules having high generalization capabilities, our algorithm makes use of some rule-and data-dependent parameters. We have also proposed an enhancing function that modifies the rule evaluation measures in order to assess the candidate rules more effectively before their selection. Experiments on some well-known data sets are performed to show the performance of SGERD.Index Terms-Data mining, fuzzy-rule-based classification system, fuzzy rule learning, steady-state genetic algorithm.
In this paper, we have proposed a fuzzy rule-based classifier for assigning amino acid sequences into different superfamilies of proteins. While the most popular methods for protein classification rely on sequence alignment, our approach is alignment-free and so more human readable. It accounts for the distribution of contiguous patterns of n amino acids ( n-grams) in the sequences as features, alike other alignment-independent methods. Our approach, first extracts a plenty of features from a set of training sequences, then selects only some best of them, using a proposed feature ranking method. Thereafter, using these features, a novel steady-state genetic algorithm for extracting fuzzy classification rules from data is used to generate a compact set of interpretable fuzzy rules. The generated rules are simple and human understandable. So, the biologists can utilize them, for classification purposes, or incorporate their expertise to interpret or even modify them. To evaluate the performance of our fuzzy rule-based classifier, we have compared it with the conventional nonfuzzy C4.5 algorithm, beside some other fuzzy classifiers. This comparative study is conducted through classifying the protein sequences of five superfamily classes, downloaded from a public domain database. The obtained results show that the generated fuzzy rules are more interpretable, with acceptable improvement in the classification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.