2018
DOI: 10.14419/ijet.v7i3.12924
|View full text |Cite
|
Sign up to set email alerts
|

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

Abstract: In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…Crawling. Nowadays, many companies do not list their job openings on the common job listing portals on the web [35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]. It was decided to individually crawl the API calls [50,51] of these companies' personal job portals.…”
Section: Standalone Company Websitementioning
confidence: 99%
“…Crawling. Nowadays, many companies do not list their job openings on the common job listing portals on the web [35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]. It was decided to individually crawl the API calls [50,51] of these companies' personal job portals.…”
Section: Standalone Company Websitementioning
confidence: 99%
“…In the future, we plan to analyze and solve additional privacy concerns in location service recommendations, as well as improve the performance, using a parallel and distributed recommendations system to manage the required massive data. This model may also improve the results of [15,[23][24][25][26][27][28][29].…”
Section: Evaluation Metricsmentioning
confidence: 99%