Ontology matching is an essential problem in the world of Semantic Web and other distributed, open world applications. Heterogeneity occurs as a result of diversity in tools, knowledge, habits, language, interests and usually the level of detail. Automated applications have been developed, implementing diverse aligning techniques and similarity measures, with outstanding performance. However, there are use cases where automated linking fails and there must be involvement of the human factor in order to create, or not create, a link. In this paper we present Alignment, a collaborative, system aided, interactive ontology matching platform. Alignment offers a user-friendly environment for matching two ontologies with the aid of configurable similarity algorithms.
The usage of Named Entity Recognition tools on domain-specific corpora is often hampered by insufficient training data. We investigate an approach to produce fine-grained named entity annotations of a large corpus of Austrian court decisions from a small manually annotated training data set. We apply a general purpose Named Entity Recognition model to produce annotations of common coarse-grained types. Next, a small sample of these annotations are manually inspected by domain experts to produce an initial fine-grained training data set. To efficiently use the small manually annotated data set we formulate the task of named entity typing as a binary classification task – for each originally annotated occurrence of an entity, and for each fine-grained type we verify if the entity belongs to it. For this purpose we train a transformer-based classifier. We randomly sample 547 predictions and evaluate them manually. The incorrect predictions are used to improve the performance of the classifier – the corrected annotations are added to the training set. The experiments show that re-training with even a very small number (5 or 10) of originally incorrect predictions can significantly improve the classifier performance. We finally train the classifier on all available data and re-annotate the whole data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.