2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06) 2006
DOI: 10.1109/wi.2006.19
|View full text |Cite
|
Sign up to set email alerts
|

A Method for Focused Crawling Using Combination of Link Structure and Content Similarity

Abstract: Abstract-The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines. A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given web document to the topic. In this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(19 citation statements)
references
References 6 publications
0
19
0
Order By: Relevance
“…Jamali, H. Sayyadi, B. B. Hariri and H. Abolhassani [12] uses a combination of the link structure of web documents and content similarity of web documents with respect to the given topic. They treated hyperlinks and contents present in the web pages as the authors view about other web pages and it relates them to a particular domain.…”
Section: Related Workmentioning
confidence: 99%
“…Jamali, H. Sayyadi, B. B. Hariri and H. Abolhassani [12] uses a combination of the link structure of web documents and content similarity of web documents with respect to the given topic. They treated hyperlinks and contents present in the web pages as the authors view about other web pages and it relates them to a particular domain.…”
Section: Related Workmentioning
confidence: 99%
“…The Best-First [4] algorithm focuses on the retrieval of pages which are relevant to a particular given topic. It's an algorithm that uses a score to define which page has a best score.…”
Section: The Best-first Searchmentioning
confidence: 99%
“…Others are based on meta search and content block partition, as in [18]. Jamali et al [12] used a combination of link structure of the fetched pages and the content similarity of a document to a certain domain. This mixture of link structure and content is used to compute a ranking score for the candidate unfetched pages.…”
Section: Related Workmentioning
confidence: 99%