RDF2Vec: RDF Graph Embeddings for Data Mining

Ristoski, Petar; Paulheim, Heiko

doi:10.1007/978-3-319-46523-4_30

Cited by 285 publications

(245 citation statements)

References 25 publications

Supporting

Mentioning

245

Contrasting

Order By: Relevance

“…Our results in [36] have shown that random walks are a feasible and, in contrast to other techniques such as kernels, also a well scalable approach for extracting sequences.…”

Section: Introductionmentioning

confidence: 91%

“…We will introduce both brie y. A more elaborated discussion can be found from the original RDF2Vec paper [36].…”

Section: Preliminariesmentioning

confidence: 99%

“…In [36], we have introduced RDF2Vec, a generic method for embedding entities in knowledge graphs into lower-dimensional vector spaces. e approach adapts neural language modeling techniques, speci cally word2vec, which takes sequences of words to embed words into vector spaces [20,21].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Biased graph walks for RDF graph embeddings

Cochez

Ristoski

Ponzetto

et al. 2017

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Self Cite

View full text Add to dashboard Cite

Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twelve di erent edge weighting functions for performing biased walks on the RDF graph, in order to generate higher quality graph embeddings. We evaluate our approach using di erent machine learning, as well as entity and document modeling benchmark data sets, and show that the naive RDF2Vec approach can be improved by exploiting Biased Graph Walks.

show abstract

“…Our results in [36] have shown that random walks are a feasible and, in contrast to other techniques such as kernels, also a well scalable approach for extracting sequences.…”

Section: Introductionmentioning

confidence: 91%

“…We will introduce both brie y. A more elaborated discussion can be found from the original RDF2Vec paper [36].…”

Section: Preliminariesmentioning

confidence: 99%

See 1 more Smart Citation

Biased graph walks for RDF graph embeddings

Cochez

Ristoski

Ponzetto

et al. 2017

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Self Cite

View full text Add to dashboard Cite

show abstract

“…RDF2Vec [29] is a method which generates feature vectors of a given size, and does so efficiently, even for large graphs. This means that, in principle, even when faced with a machine learning problem on the scale of the web, we can reduce the problem to a set of feature vectors of, say, 500 dimensions, after which we can solve the problem on commodity hardware.…”

Section: Rdf2vecmentioning

confidence: 99%

The knowledge graph as the default data model for learning on heterogeneous knowledge

Wilcke

Bloem

Boer

2017

View full text Add to dashboard Cite

Abstract. In modern machine learning, raw data is the preferred input for our models. Where a decade ago data scientists were still engineering features, manually picking out the details we thought salient, they now prefer the data in their raw form. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate representations to sift out relevant features. However, these models are often domain specific and tailored to the task at hand, and therefore unsuited for learning on heterogeneous knowledge: information of different types and from different domains. If we can develop methods that operate on this form of knowledge, we can dispense with a great deal more ad-hoc feature engineering and train deep models end-to-end in many more domains. To accomplish this, we first need a data model capable of expressing heterogeneous knowledge naturally in various domains, in as usable a form as possible, and satisfying as many use cases as possible. In this position paper, we argue that the knowledge graph is a suitable candidate for this data model. We further describe current research and discuss some of the promises and challenges of this approach.

show abstract

“…The corpora that the algorithms are trained on can contain either natural language text (e.g. Wikipedia or newswire articles) or artificiallygenerated pseudo corpora, such as the output of the Random Walk on Graphs algorithm, when run to select sequences of nodes from a knowledge graph (KG) -see (Goikoetxea et al, 2015) and (Ristoski and Paulheim, 2016). We denote the pseudo corpus generated via Random Walk on Graphs algorithm as Pseudo Corpus RWG.…”

Section: Introductionmentioning

confidence: 99%

Towards Lexical Chains for Knowledge-Graph-basedWord Embeddings

Simov¹,

Boytcheva²,

Osenova³

2017

RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning

View full text Add to dashboard Cite

Word vectors with varying dimensionalities and produced by different algorithms have been extensively used in NLP. The corpora that the algorithms are trained on can contain either natural language text (e.g. Wikipedia or newswire articles) or artificially-generated pseudo corpora due to natural data sparseness.We exploit Lexical Chain based templates over Knowledge Graph for generating pseudo-corpora with controlled linguistic value. These corpora are then used for learning word embeddings. A number of experiments have been conducted over the following test sets: WordSim353 Similarity, WordSim353 Relatedness and SimLex-999.The results show that, on the one hand, the incorporation of many-relation lexical chains improves results, but on the other hand, unrestricted-length chains remain difficult to handle with respect to their huge quantity.

show abstract

RDF2Vec: RDF Graph Embeddings for Data Mining

Cited by 285 publications

References 25 publications

Biased graph walks for RDF graph embeddings

Biased graph walks for RDF graph embeddings

The knowledge graph as the default data model for learning on heterogeneous knowledge

Towards Lexical Chains for Knowledge-Graph-basedWord Embeddings

Contact Info

Product

Resources

About