Proceedings of the 2012 ACM Symposium on Document Engineering 2012
DOI: 10.1145/2361354.2361380
|View full text |Cite
|
Sign up to set email alerts
|

Structural and visual comparisons for web page archiving

Abstract: In this paper, we propose a Web page archiving system that combines state-of-the-art comparison methods based on the source codes of Web pages, with computer vision techniques. To detect whether successive versions of a Web page are similar or not, our system is based on: (1) a combination of structural and visual comparison methods embedded in a statistical discriminative model, (2) a visual similarity measure designed for Web pages that improves change detection, (3) a supervised feature selection method ada… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…On a structural level of HTML documents (Adar et al, 2009b) presented algorithms for describing DOM tree elements modifications as well as persistence of structural blocks. Several recent studies make use of visual features of web pages to detect and measure changes (Saad and Gançarski, 2010;Law et al, 2012).…”
Section: Methods Of Change Detectionmentioning
confidence: 99%
“…On a structural level of HTML documents (Adar et al, 2009b) presented algorithms for describing DOM tree elements modifications as well as persistence of structural blocks. Several recent studies make use of visual features of web pages to detect and measure changes (Saad and Gançarski, 2010;Law et al, 2012).…”
Section: Methods Of Change Detectionmentioning
confidence: 99%
“…Similarly, experiment automation in Plato and, equally important, the feasibility of large-scale preservation operations in general, is entirely dependent on the existence of well-tested, efficient and effective mechanisms for quality assurance. Recent work is showing promising advances (Jurik and Nielsen, 2012;Bauer and Becker, 2011;Law et al, 2012), but there is still a wide gap to be addressed for preservation operations to be broadly supported. It seems crucial that this gap is made explicit and shared with a wide community so that efforts to close it can be based on a solid assessment of the shortcomings of existing tools rather than isolated ad hoc identification of application scenarios within single institutions, as is often practiced today.…”
Section: Coverage and Correctness Of Available Measurement Techniquesmentioning
confidence: 99%
“…Digital preservation content profiles and an automated rendering and comparison tool (Law et al, 2012). A prioritization approach is taken to target first and foremost those aspects that are perceived most critical.…”
mentioning
confidence: 99%
“…It already implements source adaptors for the PRONOM registry, content profiles from C3PO, repository events (ingest, access, and migration), policies and other specific adaptors. The combination of content profiles from C3PO with repository events from the Report API provides a complete overview of Continuous automated rendering experiments [26] can be used to track the ability of viewing environments to display content and verify whether it corresponds to the original performance (Law et al, 2012).…”
Section: Scout: Scalable Monitoringmentioning
confidence: 99%