Abstract:With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to collect, store, manage, search and analyze effectively. A few approaches have been proposed, but none of them covers the full data lifecycle nor provides an efficient dat… Show more
“…A preliminary step required for “thinking bigger” about data is for archaeologists to broaden their understanding of data as part of a continuous life cycle. Although definitions vary, the data life cycle refers to the main stages and transformations that data take as they move from the planning of data acquisition to data recording, processing, analysis, interpretation, dissemination, curation, and reuse (Borgman 2019; Faniel et al 2018; Liu 2021; McManamon and Ellison 2022; Williams and Williams 2019; Yakel et al 2019). Archaeologists should plan for each stage of the data life cycle and consider, throughout this cycle, ethical implications and access.…”
Section: Archaeological Data and Data Practicesmentioning
Most archaeological investigations in the United States and other countries must comply with preservation laws, especially if they are on government property or supported by government funding. Academic and cultural resource management (CRM) studies have explored various social, temporal, and environmental contexts and produce an ever-increasing volume of archaeological data. More and more data are born digital, and many legacy data are digitized. There is a building effort to synthesize and integrate data at a massive scale and create new data standards and management systems. Taxpayer dollars often fund archaeological studies that are intended, in spirit, to promote historic preservation and provide public benefits. However, the resulting data are difficult to access and interoperationalize, and they are rarely collected and managed with their long-term security, accessibility, and ethical reuse in mind. Momentum is building toward open data and open science as well as Indigenous data sovereignty and governance. The field of archaeology is reaching a critical point where consideration of diverse constituencies, concerns, and requirements is needed to plan data collection and management approaches moving forward. This theme issue focuses on challenges and opportunities in archaeological data collection and management in academic and CRM contexts.
“…A preliminary step required for “thinking bigger” about data is for archaeologists to broaden their understanding of data as part of a continuous life cycle. Although definitions vary, the data life cycle refers to the main stages and transformations that data take as they move from the planning of data acquisition to data recording, processing, analysis, interpretation, dissemination, curation, and reuse (Borgman 2019; Faniel et al 2018; Liu 2021; McManamon and Ellison 2022; Williams and Williams 2019; Yakel et al 2019). Archaeologists should plan for each stage of the data life cycle and consider, throughout this cycle, ethical implications and access.…”
Section: Archaeological Data and Data Practicesmentioning
Most archaeological investigations in the United States and other countries must comply with preservation laws, especially if they are on government property or supported by government funding. Academic and cultural resource management (CRM) studies have explored various social, temporal, and environmental contexts and produce an ever-increasing volume of archaeological data. More and more data are born digital, and many legacy data are digitized. There is a building effort to synthesize and integrate data at a massive scale and create new data standards and management systems. Taxpayer dollars often fund archaeological studies that are intended, in spirit, to promote historic preservation and provide public benefits. However, the resulting data are difficult to access and interoperationalize, and they are rarely collected and managed with their long-term security, accessibility, and ethical reuse in mind. Momentum is building toward open data and open science as well as Indigenous data sovereignty and governance. The field of archaeology is reaching a critical point where consideration of diverse constituencies, concerns, and requirements is needed to plan data collection and management approaches moving forward. This theme issue focuses on challenges and opportunities in archaeological data collection and management in academic and CRM contexts.
“…Thus, the notion of data lake is in fact an architecture pattern in which the functionalities are well-defined. To avoid the data lake construction issues, some works narrow their system for a specific use case according to different domains [31,34,36,40,43]. We adopt a more abstract point of view, and aim to define a framework allowing to generalize the data lake pattern and to unify the component interactions.…”
The management of Big Data requires flexible systems to handle the heterogeneity of data models as well as the complexity of analytical workflows. Traditional systems like data warehouses have reached their limits due to their rigid schema-on-write paradigm, that requires well identified and defined use cases to ingest data. Data lakes, with their schema-on-read paradigm, have been proposed as more flexible systems in which raw data are directly stored in their original format associated with metadata, to be accessed and transformed only when users need to process or analyze them. Thus, it is necessary to define and control the different levels of abstraction and the dependencies among functionalities of a data lake to use it efficiently. In this article, we present a formal framework aiming to define a data lake pattern and to unify the interactions among the functionalities. We use the category theory as theoretical foundations to benefit from its high level of abstraction and its compositionality. By relying on different categories and functors, we ensure the navigation among the functionalities and allow the composition of multiples operations, while keeping track of the entire lineage of data. We also show how our framework can be applied on a simple example of data lake.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.