Relevance of software reuse in building advanced scientific data processing systems

Marshall, James; Downs, Robert R.; Samadi, Shahin

doi:10.1007/s12145-010-0054-3

Cited by 5 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Software citation guidelines (Fox et al, 2021;Katz and Chue Hong, 2018;Smith et al, 2016) and platforms such as Zenodo and the Open Science Framework are meant to make it easier to create a persistent reference to a codebase and therefore facilitate code reuse. Within the Earth sciences, there have been several efforts to facilitate code sharing and reuse, such as the NASA Earth Science Data Systems (ESDS) Software Reuse Portal (Downs et al, 2006;Gerard et al, 2007;Marshall et al, 2010). Despite these efforts, the widespread reuse of code is still nascent, and challenges remain in making code findable, citable, and reusable.…”

Section: Code Sharing and Reusementioning

confidence: 99%

Revealing Earth science code and data-use practices using the Throughput Graph Database

Thomer

Wofford

Lenard

et al. 2023

Recent Advancement in Geoinformatics and Data Science

View full text Add to dashboard Cite

The increased use of complex programmatic workflows and open data within the Earth sciences has led to an increase in the need to find and reuse code, whether as examples, templates, or code snippets that can be used across projects. The “Throughput Graph Database” project offers a platform for discovery that links research objects by using structured annotations. Throughput was initially populated by scraping GitHub for code repositories that reference the names or URLs of data archives listed on the Registry of Research Data Repositories (https://re3data.org). Throughput annotations link the research data archives to public code repositories, which makes data-relevant code repositories easier to find. Linking code repositories in a queryable, machine-readable way is only the first step to improving discoverability. A better understanding of the ways in which data is used and reused in code repositories is needed to better support code reuse. In this paper, we examine the data practices of Earth science data reusers through a classification of GitHub repositories that reference geology and paleontology data archives. A typology of seven reuse classes was developed to describe how data were used within a code repository, and it was applied to a subset of 129 public code repositories on GitHub. Code repositories could have multiple typology assignments. Data use for Software Development dominated (n = 44), followed by Miscellaneous Links to Data Archives (n = 41), Analysis (n = 22), and Educational (n = 20) uses. GitHub repository features show some relationships to the assigned typologies, which indicates that these characteristics may be leveraged to systematically predict a code repository’s category or discover potentially useful code repositories for certain data archives.

show abstract

Section: Code Sharing and Reusementioning

confidence: 99%

Revealing Earth science code and data-use practices using the Throughput Graph Database

Thomer

Wofford

Lenard

et al. 2023

Recent Advancement in Geoinformatics and Data Science

View full text Add to dashboard Cite

show abstract

“…Planning for the next set of Earth missions [14] has included deploying a prototype RES within each new mission site. This subject is discussed further in Section 8 below.…”

Section: Use Cases and Requirementsmentioning

confidence: 99%

“…This would help break down some major barriers to software reuse within the community as identified in surveys conducted by the WG [1]. However, as indicated in Section 5, recent direction from NASA Headquarters [14] has indicated that new missions could benefit from the implementation of a set of distributed systems, run on a per-mission basis. These implementations could begin with the upcoming decadal survey missions, which have been recommended by the National Research Council [15].…”

Section: Centralized Res or Distributed Systemsmentioning

confidence: 99%