ARCOMEM Crawling Architecture

Plachouras, Vassilis; Carpentier, Florent; Faheem, Muhammad; Masanès, Julien; Risse, Thomas; Senellart, Pierre; Siehndel, Patrick; Stavrakas, Yannis

doi:10.3390/fi6030518

Cited by 7 publications

(5 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One possible way to harvest data from a TOR exit node is to create a web crawler, which is a programming script that is used to open web pages and copy text and tags from each page based on instructions within the script [36]. Dark web sites are usually not crawled by the use of generic crawlers because web servers are hidden in the TOR network and they require specific protocols to be accessed.…”

Section: Creating a Dark Web Crawlermentioning

confidence: 99%

“…AppleScript is a process automation utility, which is like the PowerShell for Microsoft Windows. The creation of a web crawler based on this approach requires that the structure of a website is critically map out [36]. This approach works on the principle that one goes to the home page and copy the information as in the case of a web crawler to scrape Amazon.com.…”

Section: Creating a Dark Web Crawlermentioning

confidence: 99%

See 1 more Smart Citation

Dark Web Traffic Analysis of Cybersecurity Threats Through South African Internet Protocol Address Space

Gokhale

Olugbara

2020

SN COMPUT. SCI.

View full text Add to dashboard Cite

Cybersecurity crimes masterminded at dark web pose social security threats global and open a conundrum for researchers in the field of security informatics. Dark web describes a layer beneath deep web on Internet protocol stack that is designed to be concealed from orthodox search engines. The concealment of orthodox search engines has made it extremely hard for law enforcement agencies to track specific websites that pose great cybersecurity threats. This research was supported financially by the BankSeta, Council on Scientific and Industrial Research and National Research Foundation of South Africa to track the malicious use of dark web through South African Internet protocol address space. The study applies the method of dark web crawling using onion router to track traffic with high tendency for cybersecurity threats. The results of crawling experimental indicate that child pornography, sales of spyware, hacking, sales of drugs, planning of violence and sales of dangerous weapons are the frequent malicious use of dark web in South Africa. The outcome of this study can help in creating an accurate revelation of cybersecurity threats to assist law enforcement agencies to combat cybercriminals in the country.

show abstract

Section: Creating a Dark Web Crawlermentioning

confidence: 99%

Section: Creating a Dark Web Crawlermentioning

confidence: 99%

Dark Web Traffic Analysis of Cybersecurity Threats Through South African Internet Protocol Address Space

Gokhale

Olugbara

2020

SN COMPUT. SCI.

View full text Add to dashboard Cite

show abstract

“…The next step in gathering data from our Dark Web marketplace, was to create a Web crawler; a Web crawler is a programming script, which is used to open Web pages and then copy the text and tags from each page based on instructions within the script [59]. Many Web crawlers utilize the Python [32,[56][57][58] scripting language, often leveraging the command line cURL tool to open the target Web pages.…”

Section: Creating a Web Crawlermentioning

confidence: 99%

A Framework for More Effective Dark Web Marketplace Investigations

2018

View full text Add to dashboard Cite

Abstract:The success of the Silk Road has prompted the growth of many Dark Web marketplaces. This exponential growth has provided criminal enterprises with new outlets to sell illicit items. Thus, the Dark Web has generated great interest from academics and governments who have sought to unveil the identities of participants in these highly lucrative, yet illegal, marketplaces. Traditional Web scraping methodologies and investigative techniques have proven to be inept at unmasking these marketplace participants. This research provides an analytical framework for automating Dark Web scraping and analysis with free tools found on the World Wide Web. Using a case study marketplace, we successfully tested a Web crawler, developed using AppleScript, to retrieve the account information for thousands of vendors and their respective marketplace listings. This paper clearly details why AppleScript was the most viable and efficient method for scraping Dark Web marketplaces. The results from our case study validate the efficacy of our proposed analytical framework, which has relevance for academics studying this growing phenomenon and for investigators examining criminal activity on the Dark Web.

show abstract

“…For example, if a topic relevant to the intelligent crawl specification is found in the anchor text of a link pointing to an external Web site, this link may be prioritized over other links on the page. More details about the crawling strategy can be found in [44].…”

Section: Crawler Guidancementioning

confidence: 99%

“…The best-N-first (BFS) strategy prioritizes links according to the relevance of their parent page and fetches the first pages of the queue. In contrast, the ARCOMEM crawler uses entities instead of n-grams to determine the relevance of a page [44].…”

Section: Related Workmentioning

confidence: 99%

The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving

et al. 2014

Self Cite

View full text Add to dashboard Cite

International audienceThe constantly growing amount ofWeb content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts

show abstract

ARCOMEM Crawling Architecture

Cited by 7 publications

References 39 publications

Dark Web Traffic Analysis of Cybersecurity Threats Through South African Internet Protocol Address Space

Dark Web Traffic Analysis of Cybersecurity Threats Through South African Internet Protocol Address Space

A Framework for More Effective Dark Web Marketplace Investigations

The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving

Contact Info

Product

Resources

About