This paper presents a method to integrate external knowledge sources such as DBpedia and OpenCyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text. The method extracts and aggregates verb vectors from semantic relations identified in the corpus. It composes a knowledge base which consists of (i) verb centroids for known relations between domain concepts, (ii) mappings between concept pairs and the types of known relations, and (iii) ontological knowledge retrieved from external sources. Applying semantic inference and validation to this knowledge base yields a refined relation label suggestion. A formal evaluation compares the accuracy and average ranking precision of this hybrid method with the performance of methods that solely rely on corpus data and those that are only based on reasoning and external data sources.
Abstract. Crowdsourcing techniques provide effective means for solving a variety of ontology engineering problems. Yet, they are mainly used as external support to ontology engineering, without being closely integrated into the work of ontology engineers. In this paper we investigate how to closely integrate crowdsourcing into ontology engineering practices. Firstly, we show that a set of basic crowdsourcing tasks are used recurrently to solve a range of ontology engineering problems. Secondly, we present the uComp Protégé plugin that facilitates the integration of such typical crowdsourcing tasks into ontology engineering from within the Protégé ontology editor. An evaluation of the plugin in a typical ontology engineering scenario where ontologies are built from automatically learned semantic structures, shows that its use reduces the working times for the ontology engineers 11 times, lowers the overall task costs by 40% to 83% depending on the crowdsourcing settings used and leads to data quality comparable with that of tasks performed by ontology engineers. Evaluations on a large anatomy ontology confirm that crowdsourcing is a scalable and effective method: good quality results (accuracy of 89% and 99%) are obtained while achieving cost reductions of 75% from the ontology engineer costs and providing comparable overall task duration.
Abstract:The identification and labelling of non-hierarchical relations are among the most challenging tasks in ontology learning. This paper describes a bottom-up approach for automatically suggesting ontology link types. The presented method extracts verb vectors from semantic relations identified in the domain corpus, aggregates them by computing centroids for known relation types and stores the centroids in a central Knowledge Base (KB). Comparing verb vectors extracted from unknown relations with the stored centroids yields link-type suggestions. Domain experts evaluate these suggestions, refining the KB and constantly improving the components accuracy. Using four sample ontologies on 'energy sources', this paper demonstrates how link-type suggestion aids the ontology design process. It also provides a statistical analysis on the accuracy and average ranking performance of Batch Learning (BL) vs. Online Learning (OL).
As the manual creation of domain models and also of linked data is very costly, the extraction of knowledge from structured and unstructured data has been one of the central research areas in the Semantic Web field in the last two decades. Here, we look specifically at the extraction of formalized knowledge from natural language text, which is the most abundant source of human knowledge available. There are many tools on hand for information and knowledge extraction for English natural language, for written Thai language the situation is different. The goal of this work is to assess the state-of-the-art of research on formal knowledge extraction specifically from Thai language text, and then give suggestions and practical research ideas on how to improve the state-of-the-art. To address the goal, first we distinguish nine knowledge extraction for the Semantic Web tasks defined in literature on knowledge extraction from English text, for example taxonomy extraction, relation extraction, or named entity recognition. For each of the nine tasks, we analyze the publications and tools available for Thai text in the form of a comprehensive literature survey. Additionally to our assessment, we measure the self-assessment by the Thai research community with the help of a questionnaire-based survey on each of the tasks. Furthermore, the structure and size of the Thai community is analyzed using complex literature database queries. Combining all the collected information we finally identify research gaps in knowledge extraction from Thai language. An extensive list of practical research ideas is presented, focusing on concrete suggestions for every knowledge extraction task -which can be implemented and evaluated with reasonable effort. Besides the task-specific hints for improvements of the state-of-the-art, we also include general recommendations on how to raise the efficiency of the respective research community. key words: knowledge extraction, Thai language text, landscape analysis, semantic web † †
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.