2010
DOI: 10.5120/1593-2140
|View full text |Cite
|
Sign up to set email alerts
|

Smart Approach to Reduce the Web Crawling Traffic of Existing System using HTML based Update File at Web Server

Abstract: Web crawler is used for downloading information from web. Web pages are changed without any notice. Web crawler frequently revisits websites to check updates. It is expected that 40% of present internet traffic is because of web crawling. In this paper we propose a file which maintains the list of updated URLs of web pages of web site. Format of file is based on HTML. Crawler will only visit the UPDATE File, and need not have to revisit the full website to know the updates. This scheme can easily implement on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…Third, up to our knowledge, most of the crawling techniques require communication between the running crawlers which increases the crawling processing time and requires high-quality networks (Mukhopadhyay et al, 2006;Wu and Lai, 2010;Kumar and Neelima, 2011;Agarwal et al, 2012;Amolochitis et al, 2013;Uzun et al, 2013). The fourth crawling problem is that the conventional crawlers work is based on the URLs, and download only the pages that are allocated on the web site server, and therefore, they are inefficient when dealing with AJAX pages, as they cannot index the web sites dynamic information (Mishra et al, 2010;Nath and Bal, 2011;Bhushan et al, 2012). In addition to the above static crawling problems, AJAX crawling techniques are still suffering from several challenging problems such as the following: first, identifying web page's statesin some cases, in order to identify the page states the AJAX events need to be triggered, and this may lead to change the content of the corresponding page without changing the page URL, and such page will be recognized as one of the page's states.…”
Section: Crawlers Challenging Problemsmentioning
confidence: 99%
See 3 more Smart Citations
“…Third, up to our knowledge, most of the crawling techniques require communication between the running crawlers which increases the crawling processing time and requires high-quality networks (Mukhopadhyay et al, 2006;Wu and Lai, 2010;Kumar and Neelima, 2011;Agarwal et al, 2012;Amolochitis et al, 2013;Uzun et al, 2013). The fourth crawling problem is that the conventional crawlers work is based on the URLs, and download only the pages that are allocated on the web site server, and therefore, they are inefficient when dealing with AJAX pages, as they cannot index the web sites dynamic information (Mishra et al, 2010;Nath and Bal, 2011;Bhushan et al, 2012). In addition to the above static crawling problems, AJAX crawling techniques are still suffering from several challenging problems such as the following: first, identifying web page's statesin some cases, in order to identify the page states the AJAX events need to be triggered, and this may lead to change the content of the corresponding page without changing the page URL, and such page will be recognized as one of the page's states.…”
Section: Crawlers Challenging Problemsmentioning
confidence: 99%
“…In addition, the conventional crawler techniques require downloading all web site pages to find the updated ones, and this will increase the internet traffic and the bandwidth consumption. It has been found that approximately 40 percent of the current internet traffic, bandwidth consumption and web requests are due to search engine crawlers (Mishra et al, 2010;Nath and Bal, 2011). To solve this issue, mobile crawlers (Mishra et al, 2010;Nath and Bal, 2011) and sitemaps-based crawlers (Schonfeld and Shivakumar, 2009;Bhushan et al, 2012;Brawer et al, 2013) were introduced.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations