The amount of semantic data on the web has been growing rapidly in recent years. One of the key challenges triggered by this growth is the ad-hoc querying, i.e., the ability to retrieve answers from semantic resources using natural language queries. This facilitates interaction with semantic resources for the users so they can benefit from the knowledge covered by semantic data without the complexities of semantic query languages. In this paper, we focus on semantic queries, where the aim is to retrieve objects belonging to a set of semantically related entities. An example of such an adhoc type query is "Apollo astronauts who walked on the Moon". In order to address the task, we propose the SemSets retrieval model that exploits and combines traditional document-based information retrieval, link structure of the semantic data and entity membership in semantic sets, in order to provide the answers. The novelty of the approach lies in the utilization of semantic sets, i.e., groups of semantically related entities. We propose two approaches to identify such semantic sets from the knowledge bases; the first one requires involvement of an expert user knowledgeable of the data set structure, the second one is fully automatic and provides results that are comparable with those delivered by the expert users. As demonstrated in the experimental evaluation, the proposed model has the state-of-the-art performance on the SemSearch2011 data set, which has been designed especially for the semantic list search evaluation.
A significant number of graph database systems has emerged in the past few years. Most aim at the management of the property graph data structure: where graph elements can be assigned with properties. In this paper, we address the need to compare the performance of different graph databases, and discuss the challenges of developing fair benchmarking methodologies. We believe that, compared to other database systems, the ability to efficiently traverse over the graph topology is unique to graph databases. As such, we focus our attention on the benchmarking of traversal operations. We describe the design of the graph traversal benchmark and present its results. The benchmark provides the means to compare the performance of different data management systems and gives us insight into the abilities and limitations of modern graph databases.
The community detection in networks is a prominent task in the graph data mining, because of the rapid emergence of the graph data; e.g., information networks or social networks. In this paper, we propose a new algorithm for detecting communities in networks. Our approach differs from others in the ability of constraining the size of communities being generated, a property important for a class of applications. In addition, the algorithm is greedy in nature and belongs to a small family of community detection algorithms with the pseudo-linear time complexity, making it applicable also to large networks. The algorithm is able to detect small-sized clusters independently of the network size. It can be viewed as complementary approach to methods optimizing modularity, which tend to increase the size of generated communities with the increase of the network size. Extensive evaluation of the algorithm on synthetic benchmark graphs for community detection showed that the proposed approach is very competitive with state-of-the-art methods, outperforming other approaches in some of the settings.
Graph clustering, often addressed as community detection, is a prominent task in the domain of graph data mining with dozens of algorithms proposed in recent years. In this paper, we focus on several popular community detection algorithms with low computational complexity and with decent performance on the artificial benchmarks, and we study their behaviour on real-world networks. Motivated by the observation that there is a class of networks for which the community detection methods fail to deliver good community structure, we examine the assortativity coefficient of ground-truth communities and show that assortativity of a community structure can be very different from the assortativity of the original network. We then examine the possibility of exploiting the latter by weighting edges of a network with the aim to improve the community detection outputs for networks with assortative community structure. The evaluation shows that the proposed weighting can significantly improve the results of community detection methods on networks with assortative community structure.
Abstract. Nowadays, capturing the knowledge in ontological structures is one of the primary focuses of the semantic web research. To exploit the knowledge from the vast quantity of existing unstructured texts available in natural languages in ontologies, tools for automatic semantic annotation (ASA) are heavily needed. In this paper, we present the ASA tool Ontea and empowering of the method by Grid technology for performance increase, which help us in delivering formalized semantic data in shorter time. We have adjusted Ontea annotation algorithm to be executable in the distributed grid environment. We also give performance evaluation of Ontea algorithm and experimental results from cluster and grid implementation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.