Purpose
This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically enrich data services, so that they can dynamically provide both semantic descriptions and data representations.
Design/methodology/approach
The enrichment approach is designed to intercept the requests from data services. Therefore, a domain ontology is constructed and evolved in accordance with the syntactic representations provided by such services in order to define the data concepts. In addition, a property matching mechanism is proposed to exploit the potential data intersection observed in data service representations and external data sources so as to enhance the domain ontology with new equivalences triples. Finally, the enrichment approach is capable of deriving on demand a semantic description and data representations that link to the domain ontology concepts.
Findings
Experiments were performed using real-world datasets, such as DBpedia, GeoNames as well as open government data. The obtained results show the applicability of the proposed architecture and that it can boost the development of semantic data services. Moreover, the matching approach achieved better performance when compared with other existing approaches found in the literature.
Research limitations/implications
This work only considers services designed as data providers, i.e., services that provide an interface for accessing data sources. In addition, our approach assumes that both data services and external sources – used to enhance the domain ontology – have some potential of data intersection. Such assumption only requires that services and external sources share particular property values.
Originality/value
Unlike most of the approaches found in the literature, the architecture proposed in this paper is meant to semantically enrich data services in such way that human intervention is minimal. Furthermore, an automata-based index is also presented as a novel method that significantly improves the performance of the property matching mechanism.