2017
DOI: 10.1145/3041656
|View full text |Cite
|
Sign up to set email alerts
|

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages

Abstract: It has long been suspected that web archives and search engines favor Western and English language webpages. In this article, we quantitatively explore how well indexed and archived Arabic language webpages are as compared to those from other languages. We began by sampling 15,092 unique URIs from three different website directories: DMOZ (multilingual), Raddadi, and Star28 (the last two primarily Arabic language). Using language identification tools, we eliminated pages not in the Arabic language (e.g., Engli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…Most research involving Memento aggregation relates to usage of the aggregator rather than enhancement of the aggregation process. In the same way that prior to MemGator, researchers would state "we requested URIs from the Time Travel Service", this statement was transformed to "we used MemGator to request URIs", indicative that it was useful for researchers to utilize their own aggregator instance [21,14,4]. A facet of this use case is the ability for researchers to customize the set of web archives to be used as the basis for querying, which is performed prior to running MemGator by modifying a configuration file 4 .…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Most research involving Memento aggregation relates to usage of the aggregator rather than enhancement of the aggregation process. In the same way that prior to MemGator, researchers would state "we requested URIs from the Time Travel Service", this statement was transformed to "we used MemGator to request URIs", indicative that it was useful for researchers to utilize their own aggregator instance [21,14,4]. A facet of this use case is the ability for researchers to customize the set of web archives to be used as the basis for querying, which is performed prior to running MemGator by modifying a configuration file 4 .…”
Section: Related Workmentioning
confidence: 99%
“…In the same way that prior to MemGator, researchers would state "we requested URIs from the Time Travel Service", this statement was transformed to "we used MemGator to request URIs", indicative that it was useful for researchers to utilize their own aggregator instance [21,14,4]. A facet of this use case is the ability for researchers to customize the set of web archives to be used as the basis for querying, which is performed prior to running MemGator by modifying a configuration file 4 . This paper examines the aggregation process beyond accessing an aggregator and does so at a more abstract level than the ability to customize the archival sources.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Ainsworth et al (Ainsworth et al, 2011) investigated how much of the web was archived and estimated that 35 -90% of existing web resources have at least one Memento. Alkwai et al estimated the archive coverage of Arabic websites (Alkwai et al, 2015), and later conducted an additional study (Alkwai et al, 2017) to compare the archiving rates of English-, Arabic-, Danishand Korean-language web pages. Alkwai showed that English has a higher archiving rate than Arabic, which in turn has a higher archiving rate than Danish or Korean.…”
Section: Age and Availability Of Resourcesmentioning
confidence: 99%