The Semantic Web realization depends on the availability of critical mass of metadata for the web content, linked to formal knowledge about the world. This paper presents our vision about a holistic system allowing annotation, indexing, and retrieval of documents with respect to real-world entities. A system (called KIM), partially implementing this concept is shortly presented and used for evaluation and demonstration. Our understanding is that a system for semantic annotation should be based upon specific knowledge about the world, rather than indifferent to any ontological commitments and general knowledge. To assure efficiency and reusability of the metadata we introduce a simplistic upper-level ontology which starts with some basic philosophic distinctions and goes down to the most popular entity types (people, companies, cities, etc.), thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions. Based on the ontology, an extensive knowledge base of entities descriptions is maintained. Semantically enhanced information extraction system providing automatic annotation with references to classes in the ontology and instances in the knowledge base is presented. Based on these annotations, we perform IR-like indexing and retrieval, further extended using the ontology and knowledge about the specific entities.
The KIM platform provides a novel Knowledge and Information Management framework and services for automatic semantic annotation, indexing, and retrieval of documents. It provides a mature and semantically enabled infrastructure for scalable and customizable information extraction (IE) as well as annotation and document management, based on GATE. 1 Our understanding is that a system for semantic annotation should be based upon a simple model of real-world entity concepts, complemented with quasi-exhaustive instance knowledge. To ensure efficiency, easy sharing, and reusability of the metadata we introduce an upper-level ontology. Based on the ontology, a large-scale instance base of entity descriptions is maintained. The knowledge resources involved are handled by use of state-of-the-art Semantic Web technology and standards, including RDF(S) repositories, ontology middleware and reasoning. From a technical point of view, the platform allows KIM-based applications to use it for automatic semantic annotation, for content retrieval based on semantic queries, and for semantic repository access. As a framework, KIM also allows various IE modules, semantic repositories and information retrieval engines to be plugged into it. This paper presents the KIM platform, with an emphasis on its architecture, interfaces, front-ends, and other technical issues. 2 http://www.searchlores.org, the home of a group of "web lore searching" researchers.
An explosion in the use of RDF, first as an annotation language and later as a data representation language, has driven the requirements for Web-scale server systems that can store and process huge quantities of data, and furthermore provide powerful data access and mining functionalities. This paper describes OWLIM, a family of semantic repositories that provide storage, inference and novel data-access features delivered in a scalable, resilient, industrial-strength platform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.