Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2000
DOI: 10.1145/345508.345594
|View full text |Cite
|
Sign up to set email alerts
|

A practical hypertext catergorization method using links and incrementally available class information

Abstract: As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyperlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in ef… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
69
0
9

Year Published

2004
2004
2011
2011

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 110 publications
(81 citation statements)
references
References 11 publications
3
69
0
9
Order By: Relevance
“…[4] uses factorized model to combine the content model and the link model. [14] tackles the problem by using the relaxation labeling technique. Besides the vast amount of work on link-enhanced text classification, there are increasing number of work focusing on link-enhanced clustering.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…[4] uses factorized model to combine the content model and the link model. [14] tackles the problem by using the relaxation labeling technique. Besides the vast amount of work on link-enhanced text classification, there are increasing number of work focusing on link-enhanced clustering.…”
Section: Related Workmentioning
confidence: 99%
“…Exploiting link information of networked documents to enhance text classification has been studied extensively in the research community [3,4,6,14]. It is found that, although both content attributes and links can independently form reasonable text classifiers, an algorithm that exploits both information sources has the potential to improve the classification [2,10].…”
Section: Introductionmentioning
confidence: 99%
“…TC is used in many application contexts, ranging from automatic document indexing based on a controlled vocabulary (Borko and Bernick 1963;Gray and Harley 1971;Field 1975), to document filtering (Amati and Crestani 1999;Iyer, Lewis et al 2000;Kim, Hahn et al 2000), word sense disambiguation (Gale, Church et al 1992;Escudero, Marquez et al 2000), population of hierarchical catalogues of Web resources (Chakrabarti, Dom et al 1998;Attardi, Gulli et al 1999;Oh, Myaeng et al 2000), and in general any application requiring document organization or selective and adaptive document dispatching.…”
Section: Text Categorizationmentioning
confidence: 99%
“…It was concluded that the intelligent use of this information helps improve the accuracy of the different classifiers used. (Oh, Myaeng et al 2000) reported some observations on a collection of online Korean encyclopaedia articles. They used system-predicted categories of the linked neighbours of a test document to reinforce the classification decision on that document and they obtained a 13% improvement over the baseline performance when using local text alone.…”
Section: Hypertext Categorizationmentioning
confidence: 99%
“…Oh et al [6] reported some observations on a collection of online Korean encyclopaedia articles. They used system-predicted categories of the linked neighbours of a test document to reinforce the classification decision on that document and they obtained a 13% improvement over the baseline performance when using local text alone.…”
Section: Introductionmentioning
confidence: 99%