2018 IEEE 34th International Conference on Data Engineering (ICDE) 2018
DOI: 10.1109/icde.2018.00093
|View full text |Cite
|
Sign up to set email alerts
|

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery

Abstract: Employees that spend more time finding relevant data than analyzing it suffer a data discovery problem. The large volume of data in enterprises, and sometimes the lack of knowledge of the schemas aggravates this problem. Similar to how we navigate the Web today, we propose to identify semantic links that assist analysts in their discovery tasks. These links relate tables to each other, to facilitate navigating the schemas. They also relate data to external data sources such as ontologies and dictionaries, to h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(52 citation statements)
references
References 27 publications
0
52
0
Order By: Relevance
“…In effect, existing approaches put the burden on the SPARQL users or systems to find out precisely how to write a conjunctive federated query. An emerging research direction entails automatically discovering links between datasets using Word Embeddings [15]. However, the current focus is mostly on relational data or unstructured data [7].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In effect, existing approaches put the burden on the SPARQL users or systems to find out precisely how to write a conjunctive federated query. An emerging research direction entails automatically discovering links between datasets using Word Embeddings [15]. However, the current focus is mostly on relational data or unstructured data [7].…”
Section: Related Workmentioning
confidence: 99%
“…Based on these virtual links, a set of more than twelve specialized federated query templates over these data stores was defined 14 . These templates are also available through a template-based search engine 15 . Moreover, as an example of facilitating knowledge discovery, we can mention the virtual link sets between OMA and Bgee.…”
Section: Benefits and A Sib Swiss Institute Of Bioinformatics' Applicmentioning
confidence: 99%
“…Thus, the features upon which their model was trained were not the choice of matchers, but rather the structure and various counting statistics of the match result. Recently, word embeddings were used to enhance the effectiveness of schema matchers by Fernandez et al (2018).…”
Section: Machine Learning For Data Integrationmentioning
confidence: 99%
“…To avoid this issue, federated approaches have recently been proposed (2427), but to the best of our knowledge, none of them proposes a vocabulary and patterns to extensively, explicitly and formally describe how the data sources can be interlinked further than mostly considering ‘same as’-like mappings; in effect, they put the burden on the users to find out precisely how to write a conjunctive federated query. An emerging research direction entails automatically discovering links between datasets using Word Embeddings (28). We did not pursue this approach, given that it is computationally expensive and that for our study writing the relational-to-Resource Description Framework (RDF) mappings proved more straightforward.…”
Section: Introductionmentioning
confidence: 99%