1998
DOI: 10.1016/s0169-7552(98)00108-1
|View full text |Cite
|
Sign up to set email alerts
|

Efficient crawling through URL ordering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
287
0
6

Year Published

1999
1999
2011
2011

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 570 publications
(296 citation statements)
references
References 5 publications
3
287
0
6
Order By: Relevance
“…A few systems that gather specialized content have been very successful. Cho et al compare several crawl ordering schemes based on link degree, perceived prestige, and keyword matches on the 9 See the press articles archived at http://www.cs.berkeley.edu/ ¾ soumen/focus/ Stanford University Web [12]. Terveen and Hill use similar techniques to discover related "clans" of Web pages [30].…”
Section: Related Workmentioning
confidence: 99%
“…A few systems that gather specialized content have been very successful. Cho et al compare several crawl ordering schemes based on link degree, perceived prestige, and keyword matches on the 9 See the press articles archived at http://www.cs.berkeley.edu/ ¾ soumen/focus/ Stanford University Web [12]. Terveen and Hill use similar techniques to discover related "clans" of Web pages [30].…”
Section: Related Workmentioning
confidence: 99%
“…In other studies [6,7], they propose efficient policies to improve the freshness of web pages. In [9], they propose a crawl strategy to download the most important pages first based on different metrics (e.g similarity between pages and queries, rank of a page, etc.). The research of Castillo et al [5] goes in same direction.…”
Section: Related Workmentioning
confidence: 99%
“…We start by describing related strategies considered in this work: Relevance [9] downloads the most important pages (i.e based on PageRank) first, in a fixed order. Frequency [7] selects pages to be archived according to their frequency of changes estimated by the Poisson model [8].…”
Section: Pattern-based Web Crawlingmentioning
confidence: 99%
“…The effect of exploiting other hypertext features such as segmenting Document Object Model (DOM) tag-trees that characterise a web document and propose a fine-grained topic distillation technique that combines this information with HITS is studied in [20]. Keyword-sensitive crawling strategies such as URL string analysis and other location metrics are investigated in [21]. An intelligent crawler that can adapt online the queue link-extraction strategy using a self-learning mechanism is discussed in [22].…”
Section: Related Work In Focused Crawlingmentioning
confidence: 99%