2016
DOI: 10.7250/csimq.2016-9.04
|View full text |Cite
|
Sign up to set email alerts
|

Metadata Extraction and Management in Data LakesWith GEMMS

Abstract: In addition to volume and velocity, Big data is also characterized by its variety. Variety in structure and semantics requires new integration approaches which can resolve the integration challenges also for large volumes of data. Data lakes should reduce the upfront integration costs and provide a more flexible way for data integration and analysis, as source data is loaded in its original structure to the data lake repository. Some syntactic transformation might be applied to enable access to the data in one… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
43
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 45 publications
(43 citation statements)
references
References 19 publications
(25 reference statements)
0
43
0
Order By: Relevance
“…Thus, to ensure data accessibility, exploration, and exploitation, an efficient and effective metadata system becomes an indispensible component in data lakes (Quix et al, 2016). Yet, most of the research work on data lakes still concentrate on structured data, or semi-structured data only (Farid et al, 2016;Farrugia et al, 2016;Madera and Laurent, 2016;Quix et al, 2016;Klettke et al, 2017). So far, unstructured data have not received enough consideration in the relevant research literature, while more often than not unstructured heterogeneous data occur frequently (Miloslavskaya and Tolstoy, 2016).…”
Section: Related Workmentioning
confidence: 99%
“…Thus, to ensure data accessibility, exploration, and exploitation, an efficient and effective metadata system becomes an indispensible component in data lakes (Quix et al, 2016). Yet, most of the research work on data lakes still concentrate on structured data, or semi-structured data only (Farid et al, 2016;Farrugia et al, 2016;Madera and Laurent, 2016;Quix et al, 2016;Klettke et al, 2017). So far, unstructured data have not received enough consideration in the relevant research literature, while more often than not unstructured heterogeneous data occur frequently (Miloslavskaya and Tolstoy, 2016).…”
Section: Related Workmentioning
confidence: 99%
“…We only have partial solutions in the literature. Some works concentrate on the detection of relationships between different datasets [1,9,27]. Some other work focus on the extraction of metadata for unstructured data (mostly textual data) [27,29].…”
Section: Metadatamentioning
confidence: 99%
“…Such data-intensive processing environments are hard to manage [9] [78] as the data lifecycle inside them is so complicated. Given a data product, tracing its sources and finding all the processing steps applied on it is challenging.…”
Section: Introductionmentioning
confidence: 99%