This paper presents the NTUNLP systems in the long track and the short track of the Entity Recognition and Disambiguation Challenge 2014. We first create a dictionary that contains the possible surface forms of Freebase Ids, then scan the given text from left to right with the longest match strategy to detect the mentions, and eliminate the unwanted surface forms based on a stop word list. Methods to link to the most relevant entities and select the best candidate are proposed for these two tracks, respectively. The outside resources such as DBpedia Spotlight and TAGME are integrated to our basic NTUNLP systems. Various experimental setups are presented and discussed with the development set. In the formal run, one NTUNLP system wins the first prize in the short track and another NTUNLP system gets the fourth place in the long track.
While many traditional studies on semantic relatedness utilize the lexical databases, such as WordNet 1 or Wikitionary 2 , the recent word embedding learning approaches demonstrate their abilities to capture syntactic and semantic information, and outperform the lexicon-based methods. However, word senses are not disambiguated in the training phase of both Word2Vec and GloVe, two famous word embedding algorithms, and the path length between any two senses of words in lexical databases cannot reflect their true semantic relatedness. In this paper, a novel approach that linearly combines Word2Vec and GloVe with the lexical database WordNet is proposed for measuring semantic relatedness. The experiments show that the simple method outperforms the state-of-the-art model SensEmbed.
In this research, we propose 3 different approaches to measure the semantic relatedness between 2 words: (i) boost the performance of GloVe word embedding model via removing or transforming abnormal dimensions; (ii) linearly combine the information extracted from WordNet and word embeddings; and (iii) utilize word embedding and 12 linguistic information extracted from WordNet as features for Support Vector Regression. We conducted our experiments on 8 benchmark data sets, and computed Spearman correlations between the outputs of our methods and the ground truth. We report our results together with 3 state-of-the-art approaches. The experimental results show that our method can outperform state-of-the-art approaches in all the selected English benchmark data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.