Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia 2003
DOI: 10.1145/900051.900096
|View full text |Cite
|
Sign up to set email alerts
|

Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages

Abstract: In IR (information retrieval) systems based on the vector space model, the TF-IDF scheme is widely used to characterize documents. However, in the case of documents with hyperlink structures such as Web pages, it is necessary to develop a technique for representing the contents of Web pages more accurately by exploiting the contents of their hyperlinked neighboring pages. In this paper, we first propose several approaches to refining the TF-IDF scheme for a target Web page by using the contents of its hyperlin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2004
2004
2014
2014

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 36 publications
(14 citation statements)
references
References 20 publications
0
13
0
Order By: Relevance
“…Among researches based on term frequency weighting, Sugiyama et al have proposed a method to accurately express the contents of web documents by segmenting Term Frequency-Inverse Document Frequency (TF-IDF) for pages connected with hyperlink. They have concluded, by forming feature vector, the curved shape of recall and precision [11]. Zhang et al experimented and searched the effective method for text classification through TF-IDF and Latent Semantic Indexing (LSI) (http://en .wikipedia.org/wiki/Latent semantic indexing).…”
Section: Related Workmentioning
confidence: 99%
“…Among researches based on term frequency weighting, Sugiyama et al have proposed a method to accurately express the contents of web documents by segmenting Term Frequency-Inverse Document Frequency (TF-IDF) for pages connected with hyperlink. They have concluded, by forming feature vector, the curved shape of recall and precision [11]. Zhang et al experimented and searched the effective method for text classification through TF-IDF and Latent Semantic Indexing (LSI) (http://en .wikipedia.org/wiki/Latent semantic indexing).…”
Section: Related Workmentioning
confidence: 99%
“…A key aspect of our approach is that we include contextual evidence about each paper in the form of its neighboring papers: the papers that cite the target paper (we term these citation papers) and papers referenced by the target paper (reference papers). While the use of contextual information from neighbors is not new -it has also been successfully applied to the problems of Web page representation [33], Web page classification [25], spam detection [6] -to the best of our knowledge, it has not been utilized in scholarly paper recommendation. An additional desirable property of our approach is that it is domain-independent.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Sugiyama et al [18] used the TREC-9 Web Track dataset [7] to estimate IDF values for web pages. The novel part of their work was to also include the content of hyperlinked neighboring pages in the TF-IDF calculation of a centroid page.…”
Section: Related Workmentioning
confidence: 99%
“…It is often used for term weighting in the vector space model as described by Salten et al [15]. It further can be used to generate lexical signatures (LSs) of web pages as shown in [14,13,6,10,18].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation