Proceedings of the 21st International Conference on Enterprise Information Systems 2019
DOI: 10.5220/0007706300720083
|View full text |Cite
|
Sign up to set email alerts
|

Metadata Management for Textual Documents in Data Lakes

Abstract: Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is essential to avoid the data lake turning to a so-called data swamp. Existing works about managing data lake metadata mostly focus on structured and semi-structured data, with little research on unstructured data. Thus, we propose in this paper a methodological approach to build and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0
1

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(28 citation statements)
references
References 11 publications
0
27
0
1
Order By: Relevance
“…Other models exist which are not part of a specific system. However, many of these, also including that by Walker and Alrehamy, only focus on a specific topic and thus, only support a limited set of use cases which makes them non-generic, e.g., [8,11,17,20,21]. Thenceforth these models are not considered here.…”
Section: Related Work: Discussion Of Existent Metadata Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Other models exist which are not part of a specific system. However, many of these, also including that by Walker and Alrehamy, only focus on a specific topic and thus, only support a limited set of use cases which makes them non-generic, e.g., [8,11,17,20,21]. Thenceforth these models are not considered here.…”
Section: Related Work: Discussion Of Existent Metadata Modelsmentioning
confidence: 99%
“…A central aspect of metadata management is the definition of a metadata model (e.g., [10,15,17]). By our definition a metadata model describes the relations between data and metadata elements and what metadata is collected, e.g., in the form of an explicit schema, a formal definition, or a textual description.…”
Section: Introductionmentioning
confidence: 99%
“…The definition of a metadata model for data lakes also involves identifying the metadata to be considered. To this end, we extend a medatata typology that categorizes metadata into intra-object, inter-object and global metadata [23] with new types of inter-object (relationships) and global (index, event logs) metadata.…”
Section: Metadata Typologymentioning
confidence: 99%
“…It must provide synchronous and asynchronous communication capability. (9) The data governance layer provides a set of tools to establish and execute plans and programs for data quality control [4]. Each of the above layers is implemented with one or more frameworks of the Apache Hadoop ecosystem, e.g., Atlas 1 , HDFS 2 , HIVE 3 , OpenLdap 4 , Spark 5 , etc.…”
Section: Example Dh Projects Involving Data Lakes 21 Hyperthesaumentioning
confidence: 99%