2005
DOI: 10.1007/11551362_33
|View full text |Cite
|
Sign up to set email alerts
|

Importance of HTML Structural Elements and Metadata in Automated Subject Classification

Abstract: Abstract. The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2006
2006
2021
2021

Publication Types

Select...
4
4
2

Relationship

3
7

Authors

Journals

citations
Cited by 28 publications
(21 citation statements)
references
References 12 publications
0
21
0
Order By: Relevance
“…By comparing automatically assigned classes to manually assigned ones at all the five levels of specificity (Ei has five hierarchical levels), the F1 measure was 0,26, whereas if comparison was done by reducing all the classes to the first two hierarchical levels, F1 was 0,59 (K. Golub and A. Ardö 2005). Also, an additional evaluation was performed, in which a subject expert evaluated both the automatically and manually assigned classes of a random sample of 109 Web pages.…”
Section: Algorithmmentioning
confidence: 99%
“…By comparing automatically assigned classes to manually assigned ones at all the five levels of specificity (Ei has five hierarchical levels), the F1 measure was 0,26, whereas if comparison was done by reducing all the classes to the first two hierarchical levels, F1 was 0,59 (K. Golub and A. Ardö 2005). Also, an additional evaluation was performed, in which a subject expert evaluated both the automatically and manually assigned classes of a random sample of 109 Web pages.…”
Section: Algorithmmentioning
confidence: 99%
“…Importance of HTML structural elements and metadata in automated subject classification is shown in paper [11]. The aim of the paper was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification.…”
Section: A Related Workmentioning
confidence: 99%
“…In [3], the use of information derived from HTML tags of a page for classification, is proposed. Similar method, in which the HTML tags are divided into three groups with different importance of terms in each group, is described in [4].…”
Section: A Term Weighting For Classificationmentioning
confidence: 99%