Proceedings of the Eleventh International Conference on Information and Knowledge Management 2002
DOI: 10.1145/584792.584850
|View full text |Cite
|
Sign up to set email alerts
|

Boosting to correct inductive bias in text classification

Abstract: This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of each classifier and explore how boosting, as an error-driven resampling mechanism, reacts to those biases. Our experiments on the Reuters-21578 benchmark show that boosting is not effective in improving the performance of the base classifiers on common categ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2003
2003
2014
2014

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…DML is a technique used to identify a suitable distance metric based on the data projection that can be divided into four families [30]. The first two families are based on the supervision of the method: supervised and unsupervised DML.…”
Section: Derma: Melanoma Diagnosis Based On Collaborative Multilabmentioning
confidence: 99%
“…DML is a technique used to identify a suitable distance metric based on the data projection that can be divided into four families [30]. The first two families are based on the supervision of the method: supervised and unsupervised DML.…”
Section: Derma: Melanoma Diagnosis Based On Collaborative Multilabmentioning
confidence: 99%
“…Possible learning methods include regression models, nearest neighbor classifiers, decision trees, Bayesian probabilistic classifiers, inductive rule learning algorithms, neural networks, online learning approaches, support vector machines, genetic programming techniques, and many hybrid methods. Instead of fixating on a single classification technique, the research will explore ensemble approaches that combine different techniques, such as bagging [2], boosting [5,21] and staged approaches.…”
Section: Supporting Multi-dimensional Design Explorationmentioning
confidence: 99%
“…Centroid-based algorithm [1] is a commonly used method for text categorization due to the simplicity and linearity. But it often suffers from the inductive bias [2] or model misfit [3] and researches have proposed some methods to further adjust the centroids to make the centroidbased algorithm perform better.…”
Section: Introductionmentioning
confidence: 99%