Davide Colla scite author profile

Historical archives represent an immense wealth, the potential of which is endangered by the lack of effective management and access tools. We believe that this issue can be faced by providing archive catalogs with a semantic layer, containing rich semantic metadata, representing the content of documents in a full-fledged formal machine-readable format. In this paper we present the contribution offered in this direction by the PRiSMHA project, in which the conceptual vocabulary of the semantic layer is represented by computational ontologies. However, acquiring semantic knowledge represents a well-known bottleneck for knowledge-based systems: in order to solve this problem, PRiSMHA relies on a crowdsourcing collaborative model, i.e., an online community of users who collaborate in building semantic representations of the content of archival documents. In this perspective, this paper aims at answering the following research question: Starting from the axioms characterizing concepts in the computational ontology underlying the system, how can we derive a user interface enabling users to formally represent the content of archival documents by exploiting the conceptual vocabulary provided by the ontology? Our solution includes the following steps: (a) A manually defined configuration, acting as a pre-filter, to hide "unsuited" classes, properties, and relations; (b) An algorithm, combining heuristics and reasoning, which extracts from the ontology all and only the "compatible" properties and relations, given an entity (event) type. (c) A set of strategies to rank, group, and present the entity (event) properties and relations, based on the results of a study with users. This integrated solution enabled us to design an ontology-driven user interface enabling users to characterize entities, and in particular (historical) events, on the basis of the vocabulary provided by the ontology.

show abstract

Semantic coherence markers: The contribution of perplexity metrics

Colla

Delsanto

Agosto

et al. 2022

Artificial Intelligence in Medicine

View full text Add to dashboard Cite

Violence detection explanation via semantic roles embeddings

Mensa

Colla

Dalmasso

et al. 2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Emergency room reports pose specific challenges to natural language processing techniques. In this setting, violence episodes on women, elderly and children are often under-reported. Categorizing textual descriptions as containing violence-related injuries (V) vs. non-violence-related injuries (NV) is thus a relevant task to the ends of devising alerting mechanisms to track (and prevent) violence episodes. Methods We present ViDeS (so dubbed after Violence Detection System), a system to detect episodes of violence from narrative texts in emergency room reports. It employs a deep neural network for categorizing textual ER reports data, and complements such output by making explicit which elements corroborate the interpretation of the record as reporting about violence-related injuries. To these ends we designed a novel hybrid technique for filling semantic frames that employs distributed representations of terms herein, along with syntactic and semantic information. The system has been validated on real data annotated with two sorts of information: about the presence vs. absence of violence-related injuries, and about some semantic roles that can be interpreted as major cues for violent episodes, such as the agent that committed violence, the victim, the body district involved, etc.. The employed dataset contains over 150K records annotated with class (V,NV) information, and 200 records with finer-grained information on the aforementioned semantic roles. Results We used data coming from an Italian branch of the EU-Injury Database (EU-IDB) project, compiled by hospital staff. Categorization figures approach full precision and recall for negative cases and.97 precision and.94 recall on positive cases. As regards as the recognition of semantic roles, we recorded an accuracy varying from.28 to.90 according to the semantic roles involved. Moreover, the system allowed unveiling annotation errors committed by hospital staff. Conclusions Explaining systems’ results, so to make their output more comprehensible and convincing, is today necessary for AI systems. Our proposal is to combine distributed and symbolic (frame-like) representations as a possible answer to such pressing request for interpretability. Although presently focused on the medical domain, the proposed methodology is general and, in principle, it can be extended to further application areas and categorization tasks.

show abstract

LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items

Colla

Mensa

Radicioni

2020

Computational Linguistics

View full text Add to dashboard Cite

We present LESSLEX, a novel multilingual lexical resource. Different from the vast majority of existing approaches, we ground our embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term we have thus the “blended” terminological vector along with those describing all senses associated to that term. LESSLEX has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. We experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. We conclude by arguing that LESSLEX vectors may be relevant for practical applications and for research on conceptual and lexical access and competence.

show abstract

Fruitful Synergies between Computer Science, Historical Studies and Archives: The Experience in the PRiSMHA Project

Goy

Accornero

Astrologo

et al. 2019

View full text Add to dashboard Cite

Tell Me Why: Computational Explanation of Conceptual Similarity Judgments

Colla

Mensa

Radicioni

et al. 2018

View full text Add to dashboard Cite

In this paper we introduce a system for the computation of explanations that accompany scores in the conceptual similarity task. In this setting the problem is, given a pair of concepts, to provide a score that expresses in how far the two concepts are similar. In order to explain how explanations are automatically built, we illustrate some basic features of COVER, the lexical resource that underlies our approach, and the main traits of the MeRaLi system, that computes conceptual similarity and explanations, all in one. To assess the computed explanations, we have designed a human experimentation, that provided interesting and encouraging results, which we report and discuss in depth.

show abstract

Semantic Measures for Keywords Extraction

Colla

Mensa

Radicioni

2017

View full text Add to dashboard Cite

Abstract. In this paper we introduce a minimalist hypothesis for keywords extraction: keywords can be extracted from text documents by considering concepts underlying document terms. Furthermore, central concepts are individuated as the concepts that are more related to title concepts. Namely, we propose five metrics, that are diverse in essence, to compute the centrality of concepts in the document body with respect to those in the title. We finally report about an experimentation over a popular data set of human annotated news articles; the results confirm the soundness of our hypothesis.

show abstract

Wikidata Support in the Creation of Rich Semantic Metadata for Historical Archives

et al. 2021

View full text Add to dashboard Cite

The research question this paper aims at answering is the following: In an ontology-driven annotation system, can the information extracted from external resources (namely, Wikidata) provide users with useful suggestions in the characterization of entities used for the annotation of documents from historical archives? The context of the research is the PRiSMHA project, in which the main goal is the development of a proof-of-concept prototype ontology-driven system for semantic metadata generation. The assumption behind this effort is that an effective access to historical archives needs a rich semantic knowledge, relying on a domain ontology, that describes the content of archival resources. In the paper, we present a new feature of the annotation system: when characterizing a new entity (e.g., a person), some properties describing it are automatically pre-filled in, and more complex semantic representations (e.g., events the entity is involved in) are suggested; both kinds of suggestions are based on information retrieved from Wikidata. In the paper, we describe the automatic algorithm devised to support the definition of the mappings between the Wikidata semantic model and the PRiSMHA ontology, as well as the process used to extract information from Wikidata and to generate suggestions based on the defined mappings. Finally, we discuss the results of a qualitative evaluation of the suggestions, which provides a positive answer to the initial research question and indicates possible improvements.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Davide Colla

Building Semantic Metadata for Historical Archives through an Ontology-driven User Interface

Semantic coherence markers: The contribution of perplexity metrics

Violence detection explanation via semantic roles embeddings

LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items

Fruitful Synergies between Computer Science, Historical Studies and Archives: The Experience in the PRiSMHA Project

Tell Me Why: Computational Explanation of Conceptual Similarity Judgments

Semantic Measures for Keywords Extraction

Wikidata Support in the Creation of Rich Semantic Metadata for Historical Archives

Contact Info

Product

Resources

About