1994
DOI: 10.1145/183422.183423
|View full text |Cite
|
Sign up to set email alerts
|

Automated learning of decision rules for text categorization

Abstract: We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based systems, requiring many man-years of developmental efforts, have been successfully built to “read” documents and assign topics to them. We show that machine-ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
289
0
9

Year Published

1999
1999
2006
2006

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 626 publications
(298 citation statements)
references
References 11 publications
0
289
0
9
Order By: Relevance
“…In a number of experiments (Lewis, 1992a(Lewis, , 1992b(Lewis, , 1992cApté et al, 1994), it was found that the use of phrases 2 actually caused text categorization performance to degrade. Despite of these discouraging results, investigations of using phrases have been actively pursued (Mladeni) and Grobelnik 1998;Fürnkranz, 1998;Schütze et al, 1995;Schapire et al, 1998).…”
Section: Introductionmentioning
confidence: 99%
“…In a number of experiments (Lewis, 1992a(Lewis, , 1992b(Lewis, , 1992cApté et al, 1994), it was found that the use of phrases 2 actually caused text categorization performance to degrade. Despite of these discouraging results, investigations of using phrases have been actively pursued (Mladeni) and Grobelnik 1998;Fürnkranz, 1998;Schütze et al, 1995;Schapire et al, 1998).…”
Section: Introductionmentioning
confidence: 99%
“…A non exhaustive list of machines learning approaches to text categorization includes naive Bayes [5]), k-nearest neighbors [4], SVM [6], boosting [7], and rule-learning algorithms [8]. However, most of these studies apply text classification to a small set of classes (usually a few hundreds, as in the paradigmatic Reuters' collection [9]).…”
Section: Concept Mapping As a Learning-free Classification Taskmentioning
confidence: 99%
“…For each triplet provided in table 1, the first letter refers to the term frequency, the second refers to the inverse document frequency and the third letter refers to a normalization factor. 8 Available on the first author's homepage.…”
Section: Regular Expressions and Mesh Thesaurusmentioning
confidence: 99%
“…In the 1980's a common approach to document classification was rule-based, which involved a human in the construction of classifier. Though such a method provides accurate rules and has the additional benefit of being human understandable, the construction of such rules requires significant human input and the human needs some knowledge concerning the details of rule construction as well as domain knowledge, which become a bottleneck of this approach [1]. As an alternative, the machine learning (ML) approach has become the dominant one since 1990's.…”
Section: Introductionmentioning
confidence: 99%