The Data Documentation Initiative (DDI) is an emerging metadata standard for the social sciences. The DDI is in active use by many data specialists and archivists, but researchers themselves have been slow to recognize the benefits of the standards approach to metadata. This paper outlines how the DDI has evolved since its inception in 1995 and discusses ways to broaden its impact in the social science research community.
In 1999, when NASA's Mars Climate Orbiter missed its intended orbit and burned up in the Martian atmosphere, the media had a heyday over the reason: one team had used metric units in its thrust calculations, another, imperial. The navigation software that exchanged this information lacked a built-in process to check units. So when one team's software produced data in imperial units rather than the expected metric ones, the spacecraft was set on the wrong trajectory. The result was the loss of five years of effort and hundreds of millions of taxpayers' dollars.Two decades on, such problems persist. Researchers across fields often assume that their colleagues understand details without specifying them, and are therefore remiss when documenting units. Sometimes they leave them out entirely, provide ones that have multiple definitions or use units of convenience that have never been formally recognized.Humans struggle to interpret numbers with sloppy or missing units, and it is much more difficult when computers are involved. Most software packages, data-management tools and programming languages lack built-in support for associating units with numeric data (with the exception of the language F#). This means that information is essentially stored and managed as 'unitless' values. Disciplines including bioscience and aerospace engineering have adopted conventions for unit representation, such as the Unified Code for Units of Measure (UCUM) and the Quantities, Units, Dimensions, and Types (QUDT) Ontology. But there are no broadly agreed technical specifications for how to represent quantities and their associated units without confusing machines.There have been many calls in recent years to make data sets FAIR (Findable, Accessible, Interoperable and Reusable;
We have created tools that automate one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. The C2Metadata ("Continuous Capture of Metadata for Statistical Data") Project creates a metadata workflow paralleling the data management process by deriving provenance information from scripts used to manage and transform data. C2Metadata differs from most previous data provenance initiatives by documenting transformations at the variable level rather than describing a sequence of opaque programs. Command scripts for statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogues and codebooks and to create "variable lineages" for auditing software operations. Better data documentation makes research more transparent and expands the discovery and re-use of research data.
Intended audience:This document is for software designers who are developing DDI applications. The designers may be familiar or unfamiliar with the DDI specification. Abstract:This best practices document looks at a possible way to design components that can be combined to create DDI applications. Given that object-oriented design is the most common programming paradigm, and that systems are often based around service-oriented principles, and given the modular design of DDI 3.0 itself, this document provides an architectural model that can be a reference point for implementers. The document also takes into consideration issues of maintenance and management of DDI applications, and discusses best practices for application documentation and configuration. The focus is on interoperability of DDI applications. Status:This document is updated periodically on no particular schedule. IntroductionThis best practices document looks at a possible way to design components that can be combined to create DDI applications. The paper is targeted at developers, but it does not assume a high level of DDI knowledge. It is intended to serve as a starting point for developers new to the DDI. Problem statement 50Software developers who are new to the DDI 3.0 standard may find the standard daunting. This best practices document provides an overview of how an application may be structured so that developers have a starting point for the design of their application. Terminology Definitions59 DDI: When used without a version, DDI refers to the latest DDI specification, currently version 3.0. When older versions are referenced, the version number will be explicitly specified.DDI community: Any person or organization working with the DDI specification.DDI application: A software application that reads and/or writes DDI XML.Specification: The DDI specification.Component: A piece of software with a specific purpose with a well-defined input and welldefined output.Middleware: In the context of this best practices paper, middleware refers to utilities that manage the interface between the DDI metadata model and application services or high-level end-user tools.Task: An activity that a person undertakes in order to create, edit, or view documentation about data.End user: Person performing work in the data life cycle for whom DDI metadata is required. The end user will likely not even be aware of the DDI metadata in the application he or she is using.
Abstract:One of the objectives in creating DDI 3.0 was full machine-actionability. This requires strict versioning of objects so that users understand the change history of the resources they are using. This Best Practice is designed to provide some guidelines to metadata creators and publishers about how to version metadata and publish it for others to use. Status: DraftThis document is updated periodically on no particular schedule. Send comments to
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.