A Hybrid Revisit Policy For Web Search

Sharma, Vipul; Kumar, Mukesh; Vig, Renu

doi:10.4304/jait.3.1.36-47

Cited by 1 publication

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The crawlers were allowed to crawl all web sites under the domain of "emu.edu.tr" and the extracted outer links (if any). The number of web pages that have been processed by these crawlers were recorded in Table V 3.5 Experiment 5: the performance of the re-visiting policies In this experiment, the re-visiting performance has been investigated for the WBC and the following three re-visiting policies: the uniform (Pichler et al, 2011;Bhute and Meshram, 2010;Leng et al, 2011;Sharma et al, 2012;Singh and Vikasn, 2014); the proportional by rank (Pichler et al, 2011;Bhute and Meshram, 2010;Leng et al, 2011); and the proportional by top N levels (Pichler et al, 2011;Cho et al, 2012). This experiment was repeated for seven days where one crawler with five threads was used for each policy.…”

Section: Experiments 3: Watcher File Effects On the Site Serversmentioning

confidence: 99%

See 1 more Smart Citation

Efficient watcher based web crawler design

Alqaraleh

Ramadan

Salamah

2015

Aslib Journal of Information Management

View full text Add to dashboard Cite

Purpose – The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach – In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings – Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value – The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.

show abstract

Section: Experiments 3: Watcher File Effects On the Site Serversmentioning

confidence: 99%

“…(1) Uniform policy: in this policy, the entire web sites are downloaded at each visit (Bhute and Meshram, 2010;Pichler et al, 2011;Leng et al, 2011;Sharma et al, 2012;Singh and Vikasn, 2014). Although this approach enriches the databases, it requires a large processing time.…”

Section: Introductionmentioning

confidence: 99%