Automated Classification of Web Documents into a Hierarchy of Categories

Ceci, Michelangelo; Esposito, Floriana; Lapi, Michele; Malerba, Donato

doi:10.1007/978-3-540-36562-4_6

Search citation statements

Order By: Relevance

Paper Sections

Select...

The Classification Methods1

Introduction1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2004

Publication Types

Select...

Book1

Relationship

Self Cite1

Independent0

Authors

Journals

Cited by 1 publication

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By assuming that documents to be rejected have a low posterior probability for all categories, the problem can be reformulated in a different way, namely, how to define a threshold for the value taken by a naïve classifier. Details on the thresholding algorithm are reported in [5].…”

Section: The Classification Methodsmentioning

confidence: 99%

“…More precisely, this results from a tight integration of the system WISDOM++, which performs document understanding on the basis of geometrical information, with the content-based classification capabilities provided by the system WebClassII [4]. WebClassII is a client-server application that performs the automated classification of Web pages on the basis of their textual content.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Integrated Approach for Automatic Semantic Structure Extraction in Document Images

Berardi

Lapi

Malerba

2004

Document Analysis Systems VI

Self Cite

View full text Add to dashboard Cite

Abstract. In this paper we present an integrated approach for semantic structure extraction in document images. Document images are initially processed to extract both their layout and logical structures on the base of geometrical and spatial information. Then, textual content of logical components is employed for automatic semantic labeling of layout structures. To support the whole process different machine learning techniques are applied. Experimental results on a set of biomedical multi-page documents are discussed and future directions are drawn.

show abstract

Section: The Classification Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%