2012
DOI: 10.4304/jait.3.1.36-47
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Revisit Policy For Web Search

Abstract:

A crawler is a program that retrieves and stores pages from the Web, commonly for a Web search engine. A crawler often has to download hundreds of millions of pages in a short period of time and has to constantly monitor and refresh the downloaded pages. Once the crawler has downloaded a significant number of pages, it has to start revisiting the downloaded pages in order to refresh the downloaded collection. Due to resource constraints, search engines usually have difficulties keeping the entire l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2015
2015

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…The crawlers were allowed to crawl all web sites under the domain of "emu.edu.tr" and the extracted outer links (if any). The number of web pages that have been processed by these crawlers were recorded in Table V 3.5 Experiment 5: the performance of the re-visiting policies In this experiment, the re-visiting performance has been investigated for the WBC and the following three re-visiting policies: the uniform (Pichler et al, 2011;Bhute and Meshram, 2010;Leng et al, 2011;Sharma et al, 2012;Singh and Vikasn, 2014); the proportional by rank (Pichler et al, 2011;Bhute and Meshram, 2010;Leng et al, 2011); and the proportional by top N levels (Pichler et al, 2011;Cho et al, 2012). This experiment was repeated for seven days where one crawler with five threads was used for each policy.…”
Section: Experiments 3: Watcher File Effects On the Site Serversmentioning
confidence: 99%
See 1 more Smart Citation
“…The crawlers were allowed to crawl all web sites under the domain of "emu.edu.tr" and the extracted outer links (if any). The number of web pages that have been processed by these crawlers were recorded in Table V 3.5 Experiment 5: the performance of the re-visiting policies In this experiment, the re-visiting performance has been investigated for the WBC and the following three re-visiting policies: the uniform (Pichler et al, 2011;Bhute and Meshram, 2010;Leng et al, 2011;Sharma et al, 2012;Singh and Vikasn, 2014); the proportional by rank (Pichler et al, 2011;Bhute and Meshram, 2010;Leng et al, 2011); and the proportional by top N levels (Pichler et al, 2011;Cho et al, 2012). This experiment was repeated for seven days where one crawler with five threads was used for each policy.…”
Section: Experiments 3: Watcher File Effects On the Site Serversmentioning
confidence: 99%
“…(1) Uniform policy: in this policy, the entire web sites are downloaded at each visit (Bhute and Meshram, 2010;Pichler et al, 2011;Leng et al, 2011;Sharma et al, 2012;Singh and Vikasn, 2014). Although this approach enriches the databases, it requires a large processing time.…”
Section: Introductionmentioning
confidence: 99%