The increasing importance of ontologies has resulted in the development of a large number of ontologies in both coordinated and non-coordinated efforts. The number and complexity of such ontologies make hard to ontology and tool developers to select which ontologies to use and reuse. So far, there are no mechanism for making such decisions in an informed manner. Consequently, methods for evaluating ontology quality are required. OQuaRE is a method for ontology quality evaluation which adapts the SQuaRE standard for software product quality to ontologies. OQuaRE has been applied to identify the strengths and weaknesses of different ontologies but, so far, this framework has not been evaluated itself. Therefore, in this paper we present the evaluation of OQuaRE, performed by an international panel of experts in ontology engineering. The results include the positive and negative aspects of the current version of OQuaRE, the completeness and utility of the quality metrics included in OQuaRE and the comparison between the results of the manual evaluations done by the experts and the ones obtained by a software implementation of OQuaRE.
ObjectiveTo (1) evaluate the GoodOD guideline for ontology development by applying the OQuaRE evaluation method and metrics to the ontology artefacts that were produced by students in a randomized controlled trial, and (2) informally compare the OQuaRE evaluation method with gold standard and competency questions based evaluation methods, respectively.BackgroundIn the last decades many methods for ontology construction and ontology evaluation have been proposed. However, none of them has become a standard and there is no empirical evidence of comparative evaluation of such methods. This paper brings together GoodOD and OQuaRE. GoodOD is a guideline for developing robust ontologies. It was previously evaluated in a randomized controlled trial employing metrics based on gold standard ontologies and competency questions as outcome parameters. OQuaRE is a method for ontology quality evaluation which adapts the SQuaRE standard for software product quality to ontologies and has been successfully used for evaluating the quality of ontologies.MethodsIn this paper, we evaluate the effect of training in ontology construction based on the GoodOD guideline within the OQuaRE quality evaluation framework and compare the results with those obtained for the previous studies based on the same data.ResultsOur results show a significant effect of the GoodOD training over developed ontologies by topics: (a) a highly significant effect was detected in three topics from the analysis of the ontologies of untrained and trained students; (b) both positive and negative training effects with respect to the gold standard were found for five topics.ConclusionThe GoodOD guideline had a significant effect over the quality of the ontologies developed. Our results show that GoodOD ontologies can be effectively evaluated using OQuaRE and that OQuaRE is able to provide additional useful information about the quality of the GoodOD ontologies.
The development and application of biological ontologies have increased significantly in recent years. These ontologies can be retrieved from different repositories, which do not provide much information about quality aspects of the ontologies. In the past years, some ontology structural metrics have been proposed, but their validity as measurement instrument has not been sufficiently studied to date. In this work, we evaluate a set of reproducible and objective ontology structural metrics. Given the lack of standard methods for this purpose, we have applied an evaluation method based on the stability and goodness of the classifications of ontologies produced by each metric on an ontology corpus. The evaluation has been done using ontology repositories as corpora. More concretely, we have used 119 ontologies from the OBO Foundry repository and 78 ontologies from AgroPortal. First, we study the correlations between the metrics. Second, we study whether the clusters for a given metric are stable and have a good structure. The results show that the existing correlations are not biasing the evaluation, there are no metrics generating unstable clusterings and all the metrics evaluated provide at least reasonable clustering structure. Furthermore, our work permits to review and suggest the most reliable ontology structural metrics in terms of stability and goodness of their classifications. Availability: http://sele.inf.um.es/ontology-metrics
BackgroundThe biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task and biomedical ontologies evolve rapidly, so new versions are regularly and frequently published in ontology repositories. This has the implication of there being a high number of ontology versions over a short time span. Given this level of activity, ontology designers need to be supported in the effective management of the evolution of biomedical ontologies as the different changes may affect the engineering and quality of the ontology. This is why there is a need for methods that contribute to the analysis of the effects of changes and evolution of ontologies.ResultsIn this paper we approach this issue from the ontology quality perspective. In previous work we have developed an ontology evaluation framework based on quantitative metrics, called OQuaRE. Here, OQuaRE is used as a core component in a method that enables the analysis of the different versions of biomedical ontologies using the quality dimensions included in OQuaRE. Moreover, we describe and use two scales for evaluating the changes between the versions of a given ontology. The first one is the static scale used in OQuaRE and the second one is a new, dynamic scale, based on the observed values of the quality metrics of a corpus defined by all the versions of a given ontology (life-cycle). In this work we explain how OQuaRE can be adapted for understanding the evolution of ontologies. Its use has been illustrated with the ontology of bioinformatics operations, types of data, formats, and topics (EDAM).ConclusionsThe two scales included in OQuaRE provide complementary information about the evolution of the ontologies. The application of the static scale, which is the original OQuaRE scale, to the versions of the EDAM ontology reveals a design based on good ontological engineering principles. The application of the dynamic scale has enabled a more detailed analysis of the evolution of the ontology, measured through differences between versions. The statistics of change based on the OQuaRE quality scores make possible to identify key versions where some changes in the engineering of the ontology triggered a change from the OQuaRE quality perspective. In the case of the EDAM, this study let us to identify that the fifth version of the ontology has the largest impact in the quality metrics of the ontology, when comparative analyses between the pairs of consecutive versions are performed.
Quantitative metrics are generally applied by scientists to measure and assess the properties of data and knowledge resources. In ontology engineering, a number of metrics have been developed to analyse different features of ontologies in the last few years. However, this community has not generated any standard framework for studying the properties of ontologies or generated sufficient knowledge about the usefulness and validity as the measurement instrument of these metrics for evaluating and comparing ontologies. Recently, 19 ontology structural metrics were studied using the OBO Foundry and AgroPortal ontology repositories. This study was based on how each metric partitioned the two datasets into five groups by applying the k-means algorithm. The results suggested that the use of five clusters for every metric might be suboptimal. In this paper, we propose an automated process for the study of ontology structural metrics by including the selection of an optimal number of clusters for each metric. This optimal number is automatically obtained by using statistical properties of the generated clusters. Moreover, the cosine similarity is used for estimating the similarity of two repositories from the perspective of the behaviour of the same set of metrics. The results on the two datasets allow for a more realistic perspective on the behaviour of the metrics. In this paper, we show and discuss the difference observed in the comparative behaviour of the metrics on the two repositories when using the optimal number with respect to a predetermined number of clusters for every metric. The proposed method is not specific for ontology metrics and therefore, can be applied to other types of metrics. INDEX TERMSKnowledge-based systems, Knowledge engineering, Clustering methods, Biomedical informatics, Biomedical ontologies, Quality metrics
Background The increasing adoption of ontologies in biomedical research and the growing number of ontologies available have made it necessary to assure the quality of these resources. Most of the well-established ontologies, such as the Gene Ontology or SNOMED CT, have their own quality assurance processes. These have demonstrated their usefulness for the maintenance of the resources but are unable to detect all of the modelling flaws in the ontologies. Consequently, the development of efficient and effective quality assurance methods is needed. Methods Here, we propose a series of quantitative metrics based on the processing of the lexical regularities existing in the content of the ontology, to analyse readability and structural accuracy. The readability metrics account for the ratio of labels, descriptions, and synonyms associated with the ontology entities. The structural accuracy metrics evaluate how two ontology modelling best practices are followed: (1) lexically suggest locally define (LSLD), that is, if what is expressed in natural language for humans is available as logical axioms for machines; and (2) systematic naming, which accounts for the amount of label content of the classes in a given taxonomy shared. Results We applied the metrics to different versions of SNOMED CT. Both readability and structural accuracy metrics remained stable in time but could capture some changes in the modelling decisions in SNOMED CT. The value of the LSLD metric increased from 0.27 to 0.31, and the value of the systematic naming metric was around 0.17. We analysed the readability and structural accuracy in the SNOMED CT July 2019 release. The results showed that the fulfilment of the structural accuracy criteria varied among the SNOMED CT hierarchies. The value of the metrics for the hierarchies was in the range of 0–0.92 (LSLD) and 0.08–1 (systematic naming). We also identified the cases that did not meet the best practices. Conclusions We generated useful information about the engineering of the ontology, making the following contributions: (1) a set of readability metrics, (2) the use of lexical regularities to define structural accuracy metrics, and (3) the generation of quality assurance information for SNOMED CT.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.