2017
DOI: 10.1002/widm.1218
|View full text |Cite
|
Sign up to set email alerts
|

A survey of Web crawlers for information retrieval

Abstract: Performance of any search engine relies heavily on its Web crawler. Web crawlers are the programs that get webpages from the Web by following hyperlinks. These webpages are indexed by a search engine and can be retrieved by a user query. In the area of Web crawling, we still lack an exhaustive study that covers all crawling techniques. This study follows the guidelines of systematic literature review and applies it to the field of Web crawling. We used the standard procedure of carrying out a systematic litera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
20
0
2

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 61 publications
(30 citation statements)
references
References 265 publications
(397 reference statements)
1
20
0
2
Order By: Relevance
“…Seminal research into the graph-theoretical characteristics of the Web simplified its representation by assuming static relationships (edges) between static pages (nodes). Increasingly sophisticated Web crawlers [25] were deployed to traverse hyperlinkage structures, providing insights into its geometric characteristics [9]. Subsequent research has emphasized the importance of studying Internet topology [18,31,54].…”
Section: Background and Motivationsmentioning
confidence: 99%
See 2 more Smart Citations
“…Seminal research into the graph-theoretical characteristics of the Web simplified its representation by assuming static relationships (edges) between static pages (nodes). Increasingly sophisticated Web crawlers [25] were deployed to traverse hyperlinkage structures, providing insights into its geometric characteristics [9]. Subsequent research has emphasized the importance of studying Internet topology [18,31,54].…”
Section: Background and Motivationsmentioning
confidence: 99%
“…The nature, structure, and influence of the Web have been subject to an overwhelming body of research. However, its rapid rate of evolution and sheer magnitude have outpaced even the most advanced tools used in its study [25,44]. The expansive scale of the modern Web has necessitated increasingly sophisticated strategies for its traversal [6,10].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite simple in principle, the variety of purposes for which data is collected as well as technical issues, made crawling a rich and complex research field. As a measure of the interest for this topic, in a recent survey [4] the authors claim they censused 1488 articles about crawling. Restricting to the real implementations the authors listed 62 works.…”
Section: Crawlingmentioning
confidence: 99%
“…In the case of continuously raising of the requirements of big data, however, talent for big data is in short supply, there are many online recruitment website, for instance, 51job.com [2], ChinaHR, Zhaopin.com [3], www.lagou.com, these websites have become the carrier of a large amount of recruitment information. In the ocean of Web, finding information is like finding a needle in the haystack [4]. The search engine is used to find information on the Web.…”
Section: Introductionmentioning
confidence: 99%