2002
DOI: 10.1007/3-540-36128-6_25
|View full text |Cite
|
Sign up to set email alerts
|

Mining HTML Pages to Support Document Sharing in a Cooperative System

Abstract: Abstract. In this paper, the problem of classifying HTML documents is investigated in the context of a client-server application, named WebClass, developed to support the search activity of a geographically distributed group of people with common interests. The two main issues studied in the paper are the selection of some features to represent HTML documents and the construction of the classifiers. A new feature selection technique is presented and its interaction with different classifiers is experimentally … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2003
2003
2011
2011

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 14 publications
(15 reference statements)
0
7
0
Order By: Relevance
“…In WebClassIII it is based on an upgrade of the technique implemented and tested by Malerba et al (2002), named maxT F × DF 2 × ICF. Unlike other feature selection methods proposed in the literature on hierarchical document categorization (Mladenić & Grobelnik, 2003), maxT F × DF 2 × ICF answers the demand for terms that are shared by most of the documents of the same category and possibly no document of other categories.…”
Section: Comparison With Related Work: the Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In WebClassIII it is based on an upgrade of the technique implemented and tested by Malerba et al (2002), named maxT F × DF 2 × ICF. Unlike other feature selection methods proposed in the literature on hierarchical document categorization (Mladenić & Grobelnik, 2003), maxT F × DF 2 × ICF answers the demand for terms that are shared by most of the documents of the same category and possibly no document of other categories.…”
Section: Comparison With Related Work: the Methodsmentioning
confidence: 99%
“…For multi-class problems, as those considered in the framework proposed in this paper, Malerba et al (2002) developed a feature selection procedure that does take into account these observations. In this work, we develop an extension to the case of hierarchical training sets.…”
Section: The Feature Selection Processmentioning
confidence: 99%
See 1 more Smart Citation
“…It is based on an upgrade of the technique implemented and tested in WebClass, named TF-PF 2 -ICF. Indeed, a comparison with other two well-known feature selection measures showed better results in the case of flat classification [8].…”
Section: Hierarchical Document Classification: Issues and Related Workmentioning
confidence: 95%
“…Therefore, to facilitate sharing of Web documents among distributed work groups in a large organization, it is important to develop automated document classification tools that assist users in the process of document classification with respect to a given set of document categories. WebClass [8] is a client-server application that has been designed to support the search activity of a geographically distributed group of people with common interests. It works as an intermediary when users browse the Web through the system and categorize documents by means of one of the classification techniques available.…”
Section: Introductionmentioning
confidence: 99%