Proceedings of the Seventh International Conference on Information and Knowledge Management 1998
DOI: 10.1145/288627.288651
|View full text |Cite
|
Sign up to set email alerts
|

Inductive learning algorithms and representations for text categorization

Abstract: Text categorization -the assignment of natural language texts to one or more predefined categories based on their content -is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

16
571
0
32

Year Published

2000
2000
2012
2012

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 1,065 publications
(643 citation statements)
references
References 25 publications
16
571
0
32
Order By: Relevance
“…A wide variety of learning approaches have been applied to TC, to name a few, Bayesian classification (Lewis and Ringuette 1994;Domingo and Pazzani 1996;Larkey and Croft 1996;Koller and Sahami 1997;Lewis 1998), decision trees (Weiss, Apte et al ;Fuhr and Buckley 1991;Cohen and Hirsh 1998;Li and Jain 1998), decision rule classifiers such as CHARADE (Moulinier and Ganascia 1996), or DL-ESC (Li and Yamanishi 1999), or RIPPER (Cohen and Hirsh 1998), or SCAR (Moulinier, Raskinis et al 1996), or SCAP-1 (Apté, Damerau et al 1994), multi-linear regression models (Yang and Chute 1994;Yang and Liu 1999), Rocchio method (Hull 1994;Ittner, Lewis et al 1995;Sable and Hatzivassiloglou 2000), Neural Networks (Schütze, Hull et al 1995;Wiener, Pedersen et al 1995;Dagan, Karov et al 1997;Ng, Goh et al 1997;Lam and Lee 1999;Ruiz and Srinivasan 1999), example based classifiers (Creecy 1991;Masand, Linoff et al 1992;Larkey 1999), support vector machines (Joachims 1998), Bayesian inference networks (Tzeras and Hartmann 1993;Wai and Fan 1997;Dumais, Platt et al 1998), genetic algorithms (Masand 1994;Clack, Farringdon et al 1997), and maximum entropy modelling (Manning and Schütze 1999).…”
Section: Machine Learning Approaches To Text Categorizationmentioning
confidence: 99%
See 2 more Smart Citations
“…A wide variety of learning approaches have been applied to TC, to name a few, Bayesian classification (Lewis and Ringuette 1994;Domingo and Pazzani 1996;Larkey and Croft 1996;Koller and Sahami 1997;Lewis 1998), decision trees (Weiss, Apte et al ;Fuhr and Buckley 1991;Cohen and Hirsh 1998;Li and Jain 1998), decision rule classifiers such as CHARADE (Moulinier and Ganascia 1996), or DL-ESC (Li and Yamanishi 1999), or RIPPER (Cohen and Hirsh 1998), or SCAR (Moulinier, Raskinis et al 1996), or SCAP-1 (Apté, Damerau et al 1994), multi-linear regression models (Yang and Chute 1994;Yang and Liu 1999), Rocchio method (Hull 1994;Ittner, Lewis et al 1995;Sable and Hatzivassiloglou 2000), Neural Networks (Schütze, Hull et al 1995;Wiener, Pedersen et al 1995;Dagan, Karov et al 1997;Ng, Goh et al 1997;Lam and Lee 1999;Ruiz and Srinivasan 1999), example based classifiers (Creecy 1991;Masand, Linoff et al 1992;Larkey 1999), support vector machines (Joachims 1998), Bayesian inference networks (Tzeras and Hartmann 1993;Wai and Fan 1997;Dumais, Platt et al 1998), genetic algorithms (Masand 1994;Clack, Farringdon et al 1997), and maximum entropy modelling (Manning and Schütze 1999).…”
Section: Machine Learning Approaches To Text Categorizationmentioning
confidence: 99%
“…Because this process is highly domain dependent and considering all possible combinations of tokens is impossible, many algorithms exist to define phrasal indexes. Although some researchers have reported an improvement in classification accuracy when using such indexes (depending on the quality of the generated phrases), a number of experimental results Apté, Damerau et al 1994;Dumais, Platt et al 1998) have not been uniformly encouraging, irrespective of whether the notion of "phrase" is motivated (i) syntactically, i.e. the phrase is such according to the grammar of the language ; or (ii) statistically, i.e.…”
Section: Indexingmentioning
confidence: 99%
See 1 more Smart Citation
“…In the late '90s, Machine Learning techniques were successfully applied to Text Classification. Support Vector Machines were applied to Text Classification in [6,4]. Maximum Entropy Models were also applied in [8].…”
Section: Related Workmentioning
confidence: 99%
“…Following the previous works [14,15,10], we build binary classifiers for top ten most populous categories. In our experiment, stop words were not eliminated, and title words were not distinguished with body words.…”
Section: Reuters 21587 Text Categorization Test Collectionmentioning
confidence: 99%