2020
DOI: 10.1109/access.2020.3004756
|View full text |Cite
|
Sign up to set email alerts
|

SIMHAR - Smart Distributed Web Crawler for the Hidden Web Using SIM+Hash and Redis Server

Abstract: Developing a distributed web crawler obliges major engineering challenges, all of which are eventually associated to scale. To retain corpus of search engine and a reasonable state of freshness, the crawler must be distributed over multiple computers. In distributed crawling, crawling agents are given a task to fetch and download web pages. The number and heterogeneous structure of web pages are increasing rapidly. This made the performance a serious challenge to web crawler systems. In this paper, a distribut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 23 publications
1
3
0
Order By: Relevance
“…Along with jasmine directory and amazon, 20 real websites from Alexa's list of top sites are exhaustively crawled to check at which depth most web pages are found. Our observation is similar to [37]. Below the depth of 6, the crawler was not able to find a considerable percentage of forms.…”
Section: Path Learningsupporting
confidence: 87%
See 3 more Smart Citations
“…Along with jasmine directory and amazon, 20 real websites from Alexa's list of top sites are exhaustively crawled to check at which depth most web pages are found. Our observation is similar to [37]. Below the depth of 6, the crawler was not able to find a considerable percentage of forms.…”
Section: Path Learningsupporting
confidence: 87%
“…The links from W are kept in frontier for seed URLs. Further links are kept in fetched link frontier [37]. The proposed crawler is focused on property, book, flight, hotel, music, premier and product domains.…”
Section: Framework Of Ichwmentioning
confidence: 99%
See 2 more Smart Citations
“…A summary of existing research related to distributed crawling is shown in [32]. Other distributed crawling approaches include crawling the hidden web [33], a web crawling solution deployed by a cloud service [34], and a crawler that extracts information only regarding certain topic by classifying the crawled articles [35].…”
Section: Related Workmentioning
confidence: 99%