2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2021
DOI: 10.1109/jcdl52503.2021.00027
|View full text |Cite
|
Sign up to set email alerts
|

Profiling Web Archival Voids for Memento Routing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 33 publications
0
1
0
Order By: Relevance
“…The hashes of raw mementos are stored and made available via the Internet Archive's CDX API [68]. However, this hashing of the WARC Payload Digest is sensitive to content-encoding PLOS ONE [69]. That is, if the payload stored was received as compressed (e.g., GZip or Brotli [70]), then the payload digest would be different than if it were in plain text, even if the content served to the client at replay time would be the same (as any stored content-encoding is undone at replay time).…”
Section: Verifying the Fixity Of Digital Resourcesmentioning
confidence: 99%
“…The hashes of raw mementos are stored and made available via the Internet Archive's CDX API [68]. However, this hashing of the WARC Payload Digest is sensitive to content-encoding PLOS ONE [69]. That is, if the payload stored was received as compressed (e.g., GZip or Brotli [70]), then the payload digest would be different than if it were in plain text, even if the content served to the client at replay time would be the same (as any stored content-encoding is undone at replay time).…”
Section: Verifying the Fixity Of Digital Resourcesmentioning
confidence: 99%
“…Alam et al [4] describe archival voids or portions of URI spaces that are not present in a web archive. They created multiple Archival Void profiles using Arquivo.pt access logs, and while doing so, they identified and reported access patterns, status code distributions, and issues such as Soft-404 (when a web server responds with an HTTP 200 OK status code for pages that are actually error pages [37]) through Arquivo.pt server logs.…”
Section: Background and Related Workmentioning
confidence: 99%