2014
DOI: 10.3390/fi6030518
|View full text |Cite
|
Sign up to set email alerts
|

ARCOMEM Crawling Architecture

Abstract: The World Wide Web is the largest information repository available today. However, this information is very volatile and Web archiving is essential to preserve it for the future. Existing approaches to Web archiving are based on simple definitions of the scope of Web pages to crawl and are limited to basic interactions with Web servers. The aim of the ARCOMEM project is to overcome these limitations and to provide flexible, adaptive and intelligent content acquisition, relying on social media to create topical… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 39 publications
0
5
0
Order By: Relevance
“…One possible way to harvest data from a TOR exit node is to create a web crawler, which is a programming script that is used to open web pages and copy text and tags from each page based on instructions within the script [36]. Dark web sites are usually not crawled by the use of generic crawlers because web servers are hidden in the TOR network and they require specific protocols to be accessed.…”
Section: Creating a Dark Web Crawlermentioning
confidence: 99%
See 1 more Smart Citation
“…One possible way to harvest data from a TOR exit node is to create a web crawler, which is a programming script that is used to open web pages and copy text and tags from each page based on instructions within the script [36]. Dark web sites are usually not crawled by the use of generic crawlers because web servers are hidden in the TOR network and they require specific protocols to be accessed.…”
Section: Creating a Dark Web Crawlermentioning
confidence: 99%
“…AppleScript is a process automation utility, which is like the PowerShell for Microsoft Windows. The creation of a web crawler based on this approach requires that the structure of a website is critically map out [36]. This approach works on the principle that one goes to the home page and copy the information as in the case of a web crawler to scrape Amazon.com.…”
Section: Creating a Dark Web Crawlermentioning
confidence: 99%
“…The next step in gathering data from our Dark Web marketplace, was to create a Web crawler; a Web crawler is a programming script, which is used to open Web pages and then copy the text and tags from each page based on instructions within the script [59]. Many Web crawlers utilize the Python [32,[56][57][58] scripting language, often leveraging the command line cURL tool to open the target Web pages.…”
Section: Creating a Web Crawlermentioning
confidence: 99%
“…For example, if a topic relevant to the intelligent crawl specification is found in the anchor text of a link pointing to an external Web site, this link may be prioritized over other links on the page. More details about the crawling strategy can be found in [44].…”
Section: Crawler Guidancementioning
confidence: 99%
“…The best-N-first (BFS) strategy prioritizes links according to the relevance of their parent page and fetches the first pages of the queue. In contrast, the ARCOMEM crawler uses entities instead of n-grams to determine the relevance of a page [44].…”
Section: Related Workmentioning
confidence: 99%