Neighbourhood Exploitation in Hypertext Categorization

Benbrahim, Houda; Bramer, Max

doi:10.1007/1-84628-102-4_19

Cited by 5 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They concluded that the use of basic text content enhanced with weighted extra information (metadata + title + link anchors) improves the performance of three different classifiers. In (Benbrahim and Bramer 2004b), they used the same dataset to investigate the influence of the neighbourhood pages (incoming and outgoing pages of the target document) on classification accuracy. It was concluded that the intelligent use of this information helps improve the accuracy of the different classifiers used.…”

Section: Hypertext Categorizationmentioning

confidence: 99%

Text and Hypertext Categorization

Benbrahim

Bramer

2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorization task but also offer the possibility of using information that is not available in standard text classification, such as metadata and the content of the web pages that point to and are pointed at by a web page of interest. This chapter surveys the state of the art in text categorization and hypertext categorization, focussing particularly on issues of representation that differentiate them from 'conventional' classification tasks and from each other.

show abstract

Section: Hypertext Categorizationmentioning

confidence: 99%

Text and Hypertext Categorization

Benbrahim

Bramer

2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The content of HTML pages, along with their corresponding extra information extracted. Each document representation is enhanced by its title + link anchor + meta data + similar neighbour [15]. However, when using labelled and unlabelled data for learning a classifier, we have to specify how the unlabelled data will participate in the different steps of the hypertext representation, namely, indexation, feature reduction and vocabulary generation.…”

Section: The Classification Taskmentioning

confidence: 99%

A Fuzzy Semi-Supervised Support Vector Machines Approach to Hypertext Categorization

Benbrahim

Bramer

Artificial Intelligence in Theory and Practice II

Self Cite

View full text Add to dashboard Cite

Hypertext/text domains are characterized by several tens or hundreds of thousands of features. This represents a challenge for supervised learning algorithms which have to learn accurate classifiers using a small set of available training examples. In this paper, a fuzzy semi-supervised support vector machines (FSS-SVM) algorithm is proposed. It tries to overcome the need for a large labelled training set. For this, it uses both labelled and unlabelled data for training. It also modulates the effect of the unlabelled data in the learning process. Empirical evaluations with two real-world hypertext datasets showed that, by additionally using unlabelled data, FSS-SVM requires less labelled training data than its supervised version, support vector machines, to achieve the same level of classification performance. Also, the incorporated fuzzy membership values of the unlabelled training patterns in the learning process have positively influenced the classification performance in comparison with its crisp variant.

show abstract