Research and Development in Intelligent Systems XXI
DOI: 10.1007/1-84628-102-4_19
|View full text |Cite
|
Sign up to set email alerts
|

Neighbourhood Exploitation in Hypertext Categorization

Abstract: As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into predefined classes, came to elevate humans from that task. The extra information available in a hypertext document poses new challenges for automatic categorization. HTML tags and linked neighbourhood all provide rich information for hypertext categorization that is not available in traditional text classification. This paper looks at… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…They concluded that the use of basic text content enhanced with weighted extra information (metadata + title + link anchors) improves the performance of three different classifiers. In (Benbrahim and Bramer 2004b), they used the same dataset to investigate the influence of the neighbourhood pages (incoming and outgoing pages of the target document) on classification accuracy. It was concluded that the intelligent use of this information helps improve the accuracy of the different classifiers used.…”
Section: Hypertext Categorizationmentioning
confidence: 99%
“…They concluded that the use of basic text content enhanced with weighted extra information (metadata + title + link anchors) improves the performance of three different classifiers. In (Benbrahim and Bramer 2004b), they used the same dataset to investigate the influence of the neighbourhood pages (incoming and outgoing pages of the target document) on classification accuracy. It was concluded that the intelligent use of this information helps improve the accuracy of the different classifiers used.…”
Section: Hypertext Categorizationmentioning
confidence: 99%
“…The content of HTML pages, along with their corresponding extra information extracted. Each document representation is enhanced by its title + link anchor + meta data + similar neighbour [15]. However, when using labelled and unlabelled data for learning a classifier, we have to specify how the unlabelled data will participate in the different steps of the hypertext representation, namely, indexation, feature reduction and vocabulary generation.…”
Section: The Classification Taskmentioning
confidence: 99%