Abstract. Relational databases are valuable sources for ontology learning. Methods and tools have been proposed to generate ontologies from such structured input. However, a major persisting limitation is the derivation of ontologies with flat structure that simply mirror the schema of the source databases. In this paper, we show how the RDBToOnto tool can be used to derive accurate ontologies by taking advantage of both the database schema and the data, and more specifically through identification of taxonomies hidden in the data. This extensible tool supports an iterative approach that allows progressive refinement of the learning process through user-defined constraints.
MotivationOntology learning from relational databases is not a new research issue. Several methods and tools have been developed to deal with such structured input (e.g. [1][2][3]). However, a major persisting limitation of the existing methods is the derivation of ontologies with flat structure that simply mirror the schema of the source databases. For example, the DataMaster Protégé plugin [3] is a convenient tool that allows to import schema definition and data into Protégé, but the target populated models are simply based on ontologies of the relational model (such as Relational.OWL [4]). Such tools can significantly ease the transitioning task by automatically expressing legacy data into ontology representation formats. However, the results might not fully meet the expectations of users that are primarily attracted by the rich expressive power of semantic web formalisms and that could hardly be satisfied with target knowledge repositories that look like their source relational databases. A natural expectation is to get at the end of the learning process ontologies that better capture the underlying conceptual structure of the stored data.Ontologies with flat structure is the typical result of learning techniques that exclusively exploit information from the schema without considering the data. One of the main motivations behind the RDBToOnto tool is to implement a process that allows to learn populated ontologies with rich taxonomies by exploiting both the schema and the data in the identification of the ontology structure.
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Acquiring and updating terminological resources are di cult and tedious tasks, especially when semantic information should be provided. This paper deals with Term Semantic Categorization. The goal of this process is to assign semantic categories to unknown technical terms. We propose two approaches to the problem that rely on di erent knowledge sources. The exogeneous approach exploits contextual information extracted from corpora. The endogeneous approach relies on a lexical analysis of the technical terms. After describing the two implemented methods, we present the experiments that we conducted on signi cant test sets. The results demonstrate that term categorization can provide a reliable help in the terminology acquisition processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.