2020
DOI: 10.1007/978-3-030-45439-5_18
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Schema Labels to Enhance Dataset Search

Abstract: A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text. We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only consi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 14 publications
0
10
0
Order By: Relevance
“…Trabelsi et al [37] propose custom embeddings for column headers based on multiple contexts for table retrieval, and find representing numerical cell values to be useful. Chen et al [8] utilize matrix factorization to generate additional table headers and then show that those generated headers can improve the performance of unsupervised table search.…”
mentioning
confidence: 99%
“…Trabelsi et al [37] propose custom embeddings for column headers based on multiple contexts for table retrieval, and find representing numerical cell values to be useful. Chen et al [8] utilize matrix factorization to generate additional table headers and then show that those generated headers can improve the performance of unsupervised table search.…”
mentioning
confidence: 99%
“…The metadata is then mapped to the Google's knowledge graph, which is then used for dataset duplicates detection and for dataset discovery. Chen et al (2020) enrich metadata records with labels based on the dataset content. Chapman et al (2020) describe the whole dataset discovery process comprising querying for datasets, query processing resulting in a list of datasets, result handling and its presentation.…”
Section: Dataset Discovery Techniquesmentioning
confidence: 99%
“…Zhang & Balog, (2018) propose a semantic matching method for table retrieval where various embedding features are used. Chen et al (2020a) first learn the embedding representations of table headers and generate new headers with embedding features and curated features (Chen et al, 2018) for data tables. They show that the generated headers can be combined with the original fields of the table in order to accurately predict the relevance score of a query-table pair, and improve ranking performance.…”
Section: Structured Document Retrievalmentioning
confidence: 99%