2016
DOI: 10.1007/978-3-319-34129-3_1
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Similar Linked Datasets Using Topic Modelling

Abstract: Abstract. The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is obsolete. We study the use of topic modelling for dataset search experimentally and present Tapioca, a linked dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…Moreover, existing approaches consider schema-level [12,8,19] or data-level information [10,13] as input for the classification task. In [34] the topic extraction of RDF datasets is done through the use of schema and data level information.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, existing approaches consider schema-level [12,8,19] or data-level information [10,13] as input for the classification task. In [34] the topic extraction of RDF datasets is done through the use of schema and data level information.…”
Section: Related Workmentioning
confidence: 99%
“…Some approaches propose to model the documents (text corpora) containing natural language as a mixture of topics, where each topic is treated as a probability distribution over words such as Latent Dirichlet Allocation (LDA) [7], Pachinko Allocation [21] or Probabilistic Latent Semantic Analysis (pLSA) [15]. As in [34], authors present TAPIOCA 16 , a Linked Data search engine for determining the topical similarity between datasets. TAPIOCA takes as input the description of a dataset and searches for datasets with similar topics which are assumed to be good candidates for linking.…”
Section: Related Workmentioning
confidence: 99%
“…Profiles may encompass structure of the datasets, i.e. the used classes and predicates [33], semantically related labels [117] or topics which characterize the content [1] where a topic is a resource from a well-known and highly reused LD data source, e.g. DBpedia [83] or Wikidata 25 .…”
mentioning
confidence: 99%
“…As an additional contribution, they also return mappings between dataset classes. Other keyword-based approaches apply topic modelling methods (Ben Ellefi et al, 2016a;Röder et al, 2016) based on the assumption that similar datasets should have similar topics. For example, in TAPIOCA (Röder et al, 2016), a corpus of documents, where a document characterizes a dataset by its schema metadata (class and properties labels), is used as input to the LDA (Latent Dirichlet Allocation) algorithm to create a topic model.…”
Section: Dataset Recommendation For Link Discoverymentioning
confidence: 99%
“…Other keyword-based approaches apply topic modelling methods (Ben Ellefi et al, 2016a;Röder et al, 2016) based on the assumption that similar datasets should have similar topics. For example, in TAPIOCA (Röder et al, 2016), a corpus of documents, where a document characterizes a dataset by its schema metadata (class and properties labels), is used as input to the LDA (Latent Dirichlet Allocation) algorithm to create a topic model. The topic model preserves the distribution over topics for each dataset and the ranking order of the recommended datasets is determined by their topic distribution similarity.…”
Section: Dataset Recommendation For Link Discoverymentioning
confidence: 99%