2005
DOI: 10.1007/978-3-540-31989-4_8
|View full text |Cite
|
Sign up to set email alerts
|

Evolving Rules for Document Classification

Abstract: Abstract. We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2006
2006
2016
2016

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Hirsch et al [23] implemented a GP system to evolve rules based on n-grams (strings n character long) for document classification. GP has been used by Shengen et al [24] to generate new features used as inputs to Support Vector Machines (SVM) and GP classifiers to detect link webspam.…”
Section: Related Workmentioning
confidence: 99%
“…Hirsch et al [23] implemented a GP system to evolve rules based on n-grams (strings n character long) for document classification. GP has been used by Shengen et al [24] to generate new features used as inputs to Support Vector Machines (SVM) and GP classifiers to detect link webspam.…”
Section: Related Workmentioning
confidence: 99%
“…Specific kinds of classification problems may well be more effectively solved by EAs using rule representations "tailored" to the target kind of problem. For instance, (Hirsch et al 2005) propose a rule representation tailored to document classification (i.e., a text mining problem), where strings of characters -in general fragments of words, rather than full words -are combined via Boolean operators to form classification rules.…”
Section: Individual Representation For Classification-rule Discoverymentioning
confidence: 99%
“…T and B are respectively defined as T = {t 1 , t 2 In the inference process, the MCRDR-Classifier evaluates each rule node of the knowledge base (KB). If a case is selected from the case list (CL), the system evaluates rules from the root node and the inference result is provided by the last rule satisfied in a pathway.…”
Section: Inference and Knowledge Acquisitionmentioning
confidence: 99%
“…Although ML method produced accurate classifiers, there are a number of drawbacks as compared to a rule-based one [2]. Some limitations of the ML approach come from its assumptions about the document classification problem.…”
Section: Introductionmentioning
confidence: 99%