ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics

Liu, Pengfei; Loudcher, Sabine; Darmont, Jérôme; Noûs, Camille

doi:10.1145/3472163.3472266

Cited by 6 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A preliminary step required for “thinking bigger” about data is for archaeologists to broaden their understanding of data as part of a continuous life cycle. Although definitions vary, the data life cycle refers to the main stages and transformations that data take as they move from the planning of data acquisition to data recording, processing, analysis, interpretation, dissemination, curation, and reuse (Borgman 2019; Faniel et al 2018; Liu 2021; McManamon and Ellison 2022; Williams and Williams 2019; Yakel et al 2019). Archaeologists should plan for each stage of the data life cycle and consider, throughout this cycle, ethical implications and access.…”

Section: Archaeological Data and Data Practicesmentioning

confidence: 99%

Refining Archaeological Data Collection and Management

Heilen

Manney²

2023

Adv. archaeol. pract.

View full text Add to dashboard Cite

Most archaeological investigations in the United States and other countries must comply with preservation laws, especially if they are on government property or supported by government funding. Academic and cultural resource management (CRM) studies have explored various social, temporal, and environmental contexts and produce an ever-increasing volume of archaeological data. More and more data are born digital, and many legacy data are digitized. There is a building effort to synthesize and integrate data at a massive scale and create new data standards and management systems. Taxpayer dollars often fund archaeological studies that are intended, in spirit, to promote historic preservation and provide public benefits. However, the resulting data are difficult to access and interoperationalize, and they are rarely collected and managed with their long-term security, accessibility, and ethical reuse in mind. Momentum is building toward open data and open science as well as Indigenous data sovereignty and governance. The field of archaeology is reaching a critical point where consideration of diverse constituencies, concerns, and requirements is needed to plan data collection and management approaches moving forward. This theme issue focuses on challenges and opportunities in archaeological data collection and management in academic and CRM contexts.

show abstract

Section: Archaeological Data and Data Practicesmentioning

confidence: 99%

Refining Archaeological Data Collection and Management

Heilen

Manney²

2023

Adv. archaeol. pract.

View full text Add to dashboard Cite

show abstract

“…Thus, the notion of data lake is in fact an architecture pattern in which the functionalities are well-defined. To avoid the data lake construction issues, some works narrow their system for a specific use case according to different domains [31,34,36,40,43]. We adopt a more abstract point of view, and aim to define a framework allowing to generalize the data lake pattern and to unify the component interactions.…”

Section: Introductionmentioning

confidence: 99%

A Formal Framework for Data Lakes Based on Category Theory

Guyot

Gillet

Leclercq

et al. 2022

International Database Engineered Applications Symposium

View full text Add to dashboard Cite

The management of Big Data requires flexible systems to handle the heterogeneity of data models as well as the complexity of analytical workflows. Traditional systems like data warehouses have reached their limits due to their rigid schema-on-write paradigm, that requires well identified and defined use cases to ingest data. Data lakes, with their schema-on-read paradigm, have been proposed as more flexible systems in which raw data are directly stored in their original format associated with metadata, to be accessed and transformed only when users need to process or analyze them. Thus, it is necessary to define and control the different levels of abstraction and the dependencies among functionalities of a data lake to use it efficiently. In this article, we present a formal framework aiming to define a data lake pattern and to unify the interactions among the functionalities. We use the category theory as theoretical foundations to benefit from its high level of abstraction and its compositionality. By relying on different categories and functors, we ensure the navigation among the functionalities and allow the composition of multiples operations, while keeping track of the entire lineage of data. We also show how our framework can be applied on a simple example of data lake.

show abstract