2014
DOI: 10.2218/ijdc.v9i2.331
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging High Performance Computing for Managing Large and Evolving Data Collections

Abstract: The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be par… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…As of this writing, within XSEDE only the Texas Advanced Computing Center (TACC) is capable of allocating sufficient long-term storage space for the Goodwin Hall sensor data from its RANCH long-term storage system [46]. Even with more advanced data transfer tools [11], moving data across XSEDE still presents a major challenge [3]. If we are to avoid moving data far from the storage, TACC's Lonestar and Stampede become the only viable resources for the associated data processing and reuse.…”
Section: National Hpc Infrastructurementioning
confidence: 99%
See 2 more Smart Citations
“…As of this writing, within XSEDE only the Texas Advanced Computing Center (TACC) is capable of allocating sufficient long-term storage space for the Goodwin Hall sensor data from its RANCH long-term storage system [46]. Even with more advanced data transfer tools [11], moving data across XSEDE still presents a major challenge [3]. If we are to avoid moving data far from the storage, TACC's Lonestar and Stampede become the only viable resources for the associated data processing and reuse.…”
Section: National Hpc Infrastructurementioning
confidence: 99%
“…If we are to avoid moving data far from the storage, TACC's Lonestar and Stampede become the only viable resources for the associated data processing and reuse. Furthermore, HPC services may experience frequent interruptions and overall do not match the quality of service promised by their commercial counterparts [3].…”
Section: National Hpc Infrastructurementioning
confidence: 99%
See 1 more Smart Citation
“…Actions taken to preserve the data set in a stable format are essential. Cleaning data sets can improve their quality and make it easier for prospective reusers to understand its contents (Arora, Esteva, & Trelogan, 2014). Providing accompanying metadata and other information about provenance can help prospective reusers to find and interpret the data set, and to make judgments about its quality (Fear & Donaldson, 2012; Greenberg, 2017).…”
Section: Introductionmentioning
confidence: 99%