The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Intelligent Information Processing and Web Mining 2003
DOI: 10.1007/978-3-540-36562-4_6
|View full text |Cite
|
Sign up to set email alerts
|

Automated Classification of Web Documents into a Hierarchy of Categories

Abstract: Abstract. In this paper, the problem of classifying a HTML documents into a hierarchy of categories is investigated in the context of cooperative information repository, named WebClassII. The hierarchy of categories is involved in all aspects of automated document classification, namely feature extraction, learning, and classification of a new document. Innovative aspects of this work are: a) an experimental study on actual Web documents which can be associated to any node in the hierarchy; b) the feature sele… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2004
2004
2004
2004

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 5 publications
0
2
0
Order By: Relevance
“…By assuming that documents to be rejected have a low posterior probability for all categories, the problem can be reformulated in a different way, namely, how to define a threshold for the value taken by a naïve classifier. Details on the thresholding algorithm are reported in [5].…”
Section: The Classification Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…By assuming that documents to be rejected have a low posterior probability for all categories, the problem can be reformulated in a different way, namely, how to define a threshold for the value taken by a naïve classifier. Details on the thresholding algorithm are reported in [5].…”
Section: The Classification Methodsmentioning
confidence: 99%
“…More precisely, this results from a tight integration of the system WISDOM++, which performs document understanding on the basis of geometrical information, with the content-based classification capabilities provided by the system WebClassII [4]. WebClassII is a client-server application that performs the automated classification of Web pages on the basis of their textual content.…”
Section: Introductionmentioning
confidence: 99%