2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) 2016
DOI: 10.1109/icdmw.2016.0033
|View full text |Cite
|
Sign up to set email alerts
|

Towards Information Profiling: Data Lake Content Metadata Management

Abstract: Abstract-There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
41
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(42 citation statements)
references
References 18 publications
(35 reference statements)
1
41
0
Order By: Relevance
“…So that it is mandatory to set up a metadata management system for DL. In fact, the importance of metadata has been emphasized in many papers [1,7,31]. The first research problem that needs to be solved is the content of the metadata.…”
Section: Metadatamentioning
confidence: 99%
See 1 more Smart Citation
“…So that it is mandatory to set up a metadata management system for DL. In fact, the importance of metadata has been emphasized in many papers [1,7,31]. The first research problem that needs to be solved is the content of the metadata.…”
Section: Metadatamentioning
confidence: 99%
“…We only have partial solutions in the literature. Some works concentrate on the detection of relationships between different datasets [1,9,27]. Some other work focus on the extraction of metadata for unstructured data (mostly textual data) [27,29].…”
Section: Metadatamentioning
confidence: 99%
“…Initial Accepted Scopus 108 53 papers: [1]- [3], [5], [9], [10], [13]- [19], [23]- [29], [31]- [33], [37], [40], [45], [49], [50], [57], [60]- [66], [68], [70], [71], [73], [76]- [78], [81]- [84], [88], [90], [91], [93]- [95] Springer 222 20 papers: [4], [6], [12], [21], [30], [36], [38], [39], [41]- [43], [47], [51], [53], [69], [74], [79], [85], [86], [92] Google Scholar 197 6 papers:...…”
Section: Sourcementioning
confidence: 99%
“…The continuous challenge for healthcare professionals is the retrieval and extraction of medical data and this is due to the scarcity of technologies integration as well as important data which are time consuming and even involves manual workflows [1]. The data inaccessibility renders its inconsequential even though the data exists.…”
Section: Introductionmentioning
confidence: 99%