The Web is rich of tables (e.g., HTML tables, spreadsheets, Google Fusion Tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell.In this paper, we focus on the problem of discovering and annotating entities in tables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables.We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.
Pour identifier des mappings entre les concepts de deux ontologies, de nombreux travaux récents portent sur l'utilisation de connaissances complémentaires dites de "background" ou de support, représentées sous la forme d'une 3 ème ontologie. Leur objectif commun est de compléter les techniques classiques d'appariement qui exploitent la structure ou la richesse du langage de représentation des ontologies, et qui ne s'appliquent plus quand les ontologies à apparier sont faiblement structurées ou se limitent à de simples taxonomies. Cet article comporte deux parties. La première présente une étude de différents travaux utilisant des connaissances de support, en commençant par leur schéma général commun, suivi par une analyse des travaux en fonction du type de connaissance de support utilisée. Une seconde partie est consacrée au système d'alignement TaxoMap. Nous présentons le système et son contexte d'utilisation. Nous décrivons ensuite l'utilisation de WordNet comme connaissance de support ainsi que les résultats d'expérimentation obtenus. ABSTRACT. A lot of alignment systems providing mappings between the concepts of two ontologies rely on the use of background knowledge, represented most of the time by a third ontology. The common objective is to complement current matching techniques which exploit structure or features represented in ontology representation languages and which fail when ontologies are only hierarchies or weakly structured models. This paper has two parts. First, we present a state-of-the-art of research work using background knowledge. A common general scheme is first introduced followed by an analysis of works that differ by the kind of background knowledge they use. The second part is dedicated to TaxoMap. We present the use context and the general architecture of the system. Then, we describe the way WordNet is exploited in TaxoMap as support knowledge together with experimental results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.