Semantic entities are the entities that their concepts are available in a knowledgebase. Here, a new system will be introduced to extract semantic entities from texts. For this aim a new disambiguation method is suggested to match each of ambiguous entity with one of semantic entities in the knowledgebase. The YAGO ontology is used in this method as state of the art of knowledgebase in this field. Since entities in YAGO are meaningful, so in this method, semantic entities are obtained. Comparing the results with the literatures shows that the results of this new approach can be sufficiently reliable.
Recommendation systems are often evaluated based on user’s interactions that were collected from an existing, already deployed recommendation system. In this situation, users only provide feedback on the exposed items and they may not leave feedback on other items since they have not been exposed to them by the deployed system. As a result, the collected feedback dataset that is used to evaluate a new model is influenced by the deployed system, as a form of closed loop feedback. In this article, we show that the typical offline evaluation of recommender systems suffers from the so-called Simpson’s paradox. Simpson’s paradox is the name given to a phenomenon observed when a significant trend appears in several different sub-populations of observational data but disappears or is even reversed when these sub-populations are combined together. Our in-depth experiments based on stratified sampling reveal that a very small minority of items that are frequently exposed by the deployed system plays a confounding factor in the offline evaluation of recommendation systems. In addition, we propose a novel evaluation methodology that takes into account the confounder, i.e., the deployed system’s characteristics. Using the relative comparison of many recommendation models as in the typical offline evaluation of recommender systems, and based on the Kendall rank correlation coefficient, we show that our proposed evaluation methodology exhibits statistically significant improvements of 14% and 40% on the examined open loop datasets (Yahoo! and Coat), respectively, in reflecting the true ranking of systems with an open loop (randomised) evaluation in comparison to the standard evaluation.
The emergence of knowledge repositories in a variety of domains provides a valuable opportunity for semantic interpretation of high dimensional datasets. Previous researches investigate the use of concept instead of word as a core semantic feature for incorporating semantic knowledge from an ontology into the representation model of documents. On the other hand, in machine learning and information retrieval, data objects are represented as a flat feature vector. The inconsistency between the structural nature of the knowledge repositories and the flat representation of features in machine learning leads researchers to neglect the structure of the knowledge base and leverage concepts as isolated semantic features, which is known as bag-of-concepts. Although, using concepts has some advantages over words, by neglecting the relation between concepts, the problem of vocabulary mismatch remains in force. In this paper, a novel semantic kernel is proposed which is capable of incorporating the relatedness between conceptual features. This kernel leverages clique theory to map data objects to a novel feature space wherein complex data objects will be comparable. The proposed kernel is relevant to all applications which have a prior knowledge about the relatedness between features. We concentrate on representing text documents and words using Wikipedia and WordNet, respectively. The experimental results over a set of benchmark datasets have revealed that the proposed kernel significantly improves the representation of both words and texts in the application of semantic relatedness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.