Random Walks and Neural Network Language Models on Knowledge Bases

Goikoetxea, Josu; Soroa, Aitor; Agirre, Eneko

doi:10.3115/v1/n15-1165

Cited by 52 publications

(51 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our assumption is that such a sequence can be considered a context of its starting node: a set of words that are related to, and can appear together in real texts with, the word sense represented by that node, thus emulating real text sentences; to what extent this assumption holds depends of course on the structure of the LKB we are using. Previous efforts in building word embeddings have shown the plausibility of this approach (Goikoetxea et al, 2015).…”

Section: Random Walks As Contextsmentioning

confidence: 99%

Embedding Senses for Efficient Graph-based Word Sense Disambiguation

Piña

Johansson

2016

Proceedings of TextGraphs-10: The Workshop on Graph-Based Methods For Natural Language Processing

View full text Add to dashboard Cite

We propose a simple graph-based method for word sense disambiguation (WSD) where sense and context embeddings are constructed by applying the Skip-gram method to random walks over the sense graph. We used this method to build a WSD system for Swedish using the SALDO lexicon, and evaluated it on six different annotated test sets. In all cases, our system was several orders of magnitude faster than a state-of-the-art PageRank-based system, while outperforming a random baseline soundly.

show abstract

Section: Random Walks As Contextsmentioning

confidence: 99%

Embedding Senses for Efficient Graph-based Word Sense Disambiguation

Piña

Johansson

2016

Proceedings of TextGraphs-10: The Workshop on Graph-Based Methods For Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…As baselines, we evaluated two textcorpus-based word embeddings that are freely available on the web, as well as the best result of Goikoetxea et al (Goikoetxea et al, 2015), available from the UKB web page 9 . Thus, the pseudocorpus-based embeddings have been compared with text-based embeddings.…”

Section: Experiments Resultsmentioning

confidence: 99%

“…We reuse the sets of relations developed in these works to generate our Pseudo Corpora LC. Goikoetxea et al 2015(Goikoetxea et al, 2015 describe an architecture in which a run of the Random Walk algorithm (Agirre et al, 2014) produces an artificial corpus from WordNet. The graph that is fed to the algorithm is composed of WordNet synsets (the graph nodes) and of different types of relations between them (the graph arcs; some relation types are antonymy, hypernymy, derivation, etc.).…”

Section: Related Workmentioning

confidence: 99%

“…The corpora that the algorithms are trained on can contain either natural language text (e.g. Wikipedia or newswire articles) or artificiallygenerated pseudo corpora, such as the output of the Random Walk on Graphs algorithm, when run to select sequences of nodes from a knowledge graph (KG) -see (Goikoetxea et al, 2015) and (Ristoski and Paulheim, 2016). We denote the pseudo corpus generated via Random Walk on Graphs algorithm as Pseudo Corpus RWG.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Lexical Chains for Knowledge-Graph-basedWord Embeddings

Simov¹,

Boytcheva²,

Osenova³

2017

RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning

View full text Add to dashboard Cite

Word vectors with varying dimensionalities and produced by different algorithms have been extensively used in NLP. The corpora that the algorithms are trained on can contain either natural language text (e.g. Wikipedia or newswire articles) or artificially-generated pseudo corpora due to natural data sparseness.We exploit Lexical Chain based templates over Knowledge Graph for generating pseudo-corpora with controlled linguistic value. These corpora are then used for learning word embeddings. A number of experiments have been conducted over the following test sets: WordSim353 Similarity, WordSim353 Relatedness and SimLex-999.The results show that, on the one hand, the incorporation of many-relation lexical chains improves results, but on the other hand, unrestricted-length chains remain difficult to handle with respect to their huge quantity.

show abstract

“…Complementary to this, a plethora of works in Natural Language Processing (NLP) has recently focused on combining knowledge bases with distributional information from text. These include approaches that modify Word2Vec [15] to learn sense embeddings [5], methods to enrich WordNet with embeddings for synsets and lexemes [21], acquire continuous word representations by combining random walks over knowledge bases and neural language models [11], or produce joint lexical and semantic vectors for sense representation from text and knowledge bases [4] In this paper, we follow this line of research and take it one step forward by producing a hybrid knowledge resource, which combines symbolic and statistical meaning representations while i) staying purely on the lexical-symbolic level, ii) explicitly distinguishing word senses, and iii) being human readable. Far from being technicalities, such properties are crucial to be able to embed a resource of this kind into the Semantic Web ecosystem, where human-readable distributional representations are explicitly linked to URIfied semantic resources.…”

Section: Introductionmentioning

confidence: 99%

Linked Disambiguated Distributional Semantic Networks

Faralli

Panchenko

Biemann

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We present a new hybrid lexical knowledge base that combines the contextual information of distributional models with the conciseness and precision of manually constructed lexical networks. The computation of our countbased distributional model includes the induction of word senses for single-word and multi-word terms, the disambiguation of word similarity lists, taxonomic relations extracted by patterns and context clues for disambiguation in context. In contrast to dense vector representations, our resource is human readable and interpretable, and thus can be easily embedded within the Semantic Web ecosystem.

show abstract

Random Walks and Neural Network Language Models on Knowledge Bases

Cited by 52 publications

References 13 publications

Embedding Senses for Efficient Graph-based Word Sense Disambiguation

Embedding Senses for Efficient Graph-based Word Sense Disambiguation

Towards Lexical Chains for Knowledge-Graph-basedWord Embeddings

Linked Disambiguated Distributional Semantic Networks

Contact Info

Product

Resources

About