Catching web crawlers in the act

Lourenço, Anália; Belo, Orlando

doi:10.1145/1145581.1145634

Cited by 27 publications

(28 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our analysis reveals that such observation is generally, but not always, true. Among all the logs, unassigned referrer field was found in about 25% of the requests, which is about half of the number reported in an earlier study (Lourenco and Belo, 2006). Among the rest, about 60% contained URL external to Microsoft.…”

Section: Referrermentioning

confidence: 75%

“…Dikaiakos and Stassopoulou Kumar, 2000, 2002) Tan and Kumer (Dikaiakos et al, 2003(Dikaiakos et al, , 2005Stassopoulou and Dikaiakos, 2006) Almeida et al (Almeida et al, 2001) Lourenco and Belo (Lourenco and Belo, 2006) They used most of the information included in the log including some variations on traffic patterns. In particular, they examined HTTP-traffic characteristics (e.g., methods, error codes, etc) as well as resource referencing behavior (e.g., file type, percentage of distinct requests, resource popularity, and concentration of requests, etc).…”

Section: This Papermentioning

confidence: 99%

“…Unassigned referrer field is often considered one of the most apparent characteristics of web robots Kumar, 2000, 2002;Lourenco and Belo, 2006). As typical web robots parse a page and build up the list of pages to be visited, referrer field is frequently left blank.…”

Section: Referrermentioning

confidence: 99%

See 2 more Smart Citations

Classification of web robots: An empirical study based on over one billion requests

Lee

Cha

Lee

et al. 2009

Computers & Security

View full text Add to dashboard Cite

Section: Referrermentioning

confidence: 75%

Section: This Papermentioning

confidence: 99%

See 1 more Smart Citation

Classification of web robots: An empirical study based on over one billion requests

Lee

Cha

Lee

et al. 2009

Computers & Security

View full text Add to dashboard Cite

“…The crawler will automatic crawl information to avoid this robot.txt file will be removed [17]. In this paper, one lexical pattern is defined in which the robot.txt file will never be crawled.…”

Section: Descriptionmentioning

confidence: 99%

Web Usage Mining for Discovery and Evaluation of Online Navigation Pattern Prediction

Mehta¹,

Jadhav²,

Joshi³

2014

IJCA

View full text Add to dashboard Cite

“…The present work used the pattern mining approach introduced in [4,3], involving the semi-automatic labeling of a training set of Web sessions and tree model induction. Besides crawler and regular user sessions there were identified browser-related application sessions ( …”

Section: Profile Analysismentioning

confidence: 99%

Evaluating Web Site Structure Based on Navigation Profiles and Site Topology

Simões

Lourenço

Almeida

2013

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Abstract. This work aims at pointing out the benefits of a topologyoriented wide scope, but differentiated, profile analysis. The goal was to conciliate advanced common website usage profiling techniques with the analysis of the website's topology information, outputting valuable knowledge in an intuitive and comprehensible way. Server load balancing, crawler activity evaluation and Web site restructuring are the primary analysis concerns and, in this regard, experiments over six month data of a real-world Web site were considered successful.

show abstract

Catching web crawlers in the act

Cited by 27 publications

References 9 publications

Classification of web robots: An empirical study based on over one billion requests

Classification of web robots: An empirical study based on over one billion requests

Web Usage Mining for Discovery and Evaluation of Online Navigation Pattern Prediction

Evaluating Web Site Structure Based on Navigation Profiles and Site Topology

Contact Info

Product

Resources

About