2022
DOI: 10.1007/978-3-031-16802-4_25
|View full text |Cite
|
Sign up to set email alerts
|

CDX Summary: Web Archival Collection Insights

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…Future work is expected to continue to leverage existing tools and processes developed by Common Crawl for graph generation for this portion of the work. With the complete dataset available in CDX format, overviews of each of the EOT crawl years using CDX summarization tools [1] can be generated. These can be helpful in communicating the contents of this dataset to others.…”
Section: Crawlmentioning
confidence: 99%
See 1 more Smart Citation
“…Future work is expected to continue to leverage existing tools and processes developed by Common Crawl for graph generation for this portion of the work. With the complete dataset available in CDX format, overviews of each of the EOT crawl years using CDX summarization tools [1] can be generated. These can be helpful in communicating the contents of this dataset to others.…”
Section: Crawlmentioning
confidence: 99%
“…The full dataset is available with a Creative Commons CC0 1.0 Universal (CC0 1.0) 1 Public Domain Dedication and is downloadable from the End of Term Website in the data section 2 . A record for the dataset is also available in the Registry of Open Data on AWS 3 .…”
Section: Introductionmentioning
confidence: 99%