2019 IEEE International Conference on Cluster Computing (CLUSTER) 2019
DOI: 10.1109/cluster.2019.8891000
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Analysis of the Docker Hub Dataset

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(24 citation statements)
references
References 13 publications
0
24
0
Order By: Relevance
“…With the number of images increasing, duplicate data grows dramatically in the registry. It is reported that when the total size of unpacked images is 167 TB (unpacked from 47 TB compressed images), only 3.2% of files left after file-level deduplication, which in total only occupy 24 TB [11]. While file-level and chunk-level deduplication show similar space-saving effort, chunk-level deduplication can cause a dramatic increase in the number of unique objects, i.e., the number of unique objects of chunk- Data sharing when launching different containers.…”
Section: Motivationmentioning
confidence: 99%
See 2 more Smart Citations
“…With the number of images increasing, duplicate data grows dramatically in the registry. It is reported that when the total size of unpacked images is 167 TB (unpacked from 47 TB compressed images), only 3.2% of files left after file-level deduplication, which in total only occupy 24 TB [11]. While file-level and chunk-level deduplication show similar space-saving effort, chunk-level deduplication can cause a dramatic increase in the number of unique objects, i.e., the number of unique objects of chunk- Data sharing when launching different containers.…”
Section: Motivationmentioning
confidence: 99%
“…There are two reasons why the image format cannot support efficient image storage and deployment. (1) Regarding container storage, there is substantial redundant data between different layers of images, which results in a large waste of storage space [10,11]. Although layered images allow layer-level deduplication to reduce storage footprint, there is still substantial data redundancy that cannot be detected and removed at this coarse deduplication granularity.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Their study differs from ours as we focus on image upgrades. Zhao et al [18] studied duplication of files in images on Docker Hub from the point of view of reducing the overall storage requirements of Docker Hub.…”
Section: Related Workmentioning
confidence: 99%
“…Further, though one is advised to use volumes, read-write data still commonly exists in container images. Over the analysis of 500,000 public container images [63], 44% are document files such as Microsoft office files, texts, source code, scripts, etc., many of which are supposed to be modified. It also reports a certain amount of database-related files in container images indicating that Docker developers run databases inside containers.…”
Section: Startup Timementioning
confidence: 99%