João Rocha da Silva scite author profile

Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. First, non-experts annotated the tweets with binary labels ('hate' vs. 'no-hate'). Then, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. The hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.

show abstract

A comparison of research data management platforms: architecture, flexible metadata and interoperability

Amorim

Castro

Silva

et al. 2016

Univ Access Inf Soc

View full text Add to dashboard Cite

Research data management is rapidly becoming a regular concern for researchers, and institutions need to provide them with platforms to support data organization and preparation for publication. Some institutions have adopted institutional repositories as the basis for data deposit, whereas others are experimenting with richer environments for data description, in spite of the diversity of existing workflows. This paper is a synthetic overview of current platforms that can be used for data management purposes. Adopting a pragmatic view on data management, the paper focuses on solutions that can be adopted in the longtail of science, where investments in tools and manpower are modest. First, a broad set of data management platforms is presented-some designed for institutional repositories and digital libraries-to select a short list of the more promising ones for data management. These platforms are compared considering This paper is an extended version of a previously published comparative study. Please refer to the WCIST 2015 conference proceedings

show abstract

A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential

Amorim

Castro

Silva

et al. 2015

View full text Add to dashboard Cite

Abstract. Research data management is acknowledged as an important concern for institutions and several platforms to support data deposits have emerged. In this paper we start by overviewing the current practices in the data management workflow and identifying the stakeholders in this process. We then compare four recently proposed data repository platforms-DSpace, CKAN, Zenodo and Figshare-considering their architecture, support for metadata, API completeness, as well as their search mechanisms and community acceptance. To evaluate these features, we take into consideration the identified stakeholders' requirements. In the end, we argue that, depending on local requirements, different data repositories can meet some of the stakeholders requirements. Nevertheless, there is still room for improvements, mainly regarding the compatibility with the description of data from different research domains, to further improve data reuse.

show abstract

Ontology-based multi-domain metadata for research data management using triple stores

Silva

Ribeiro

Lopes

2014

View full text Add to dashboard Cite

Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. These are easy to understand and use, but their semantics are limited to general concepts, leaving out domain-specific metadata. The textual values for descriptors are easily indexed through free-text indexes, but faceted search and dataset interlinking becomes limited. From the point of view of the relational database schema modeler, designing a more flexible metadata model represents a non-trivial challenge because it means representing entities with attributes unknown at the time of modeling and that can change in time. Those traits, combined with the presence of hierarchies among the entities, can make the relational schema quite complex. This work demonstrates the approaches followed by current opensource platforms and proposes a graph-based model for achieving modular, ontology-based metadata for interlinked data assets in the Semantic Web. The proposed model was implemented in a collaborative research data management platform currently under development at the University of Porto.

show abstract

Hands-On Data Publishing with Researchers: Five Experiments with Metadata in Multiple Domains

Rodrigues

Castro

Silva

et al. 2019

View full text Add to dashboard Cite

Dendro: Collaborative Research Data Management Built on Linked Open Data

Silva

Castro

Ribeiro

et al. 2014

View full text Add to dashboard Cite

Knowledge Graph Implementation of Archival Descriptions Through CIDOC-CRM

Koch

Freitas

Ribeiro

et al. 2019

View full text Add to dashboard Cite

Archives have well-established description standards, namely the ISAD(G) and ISAAR(CPF) with a hierarchical structure adapted to the nature of archival assets. However, as archives connect to a growing diversity of data, they aim to make their representations more apt to the so-called linked data cloud. The corresponding move from hierarchical, ISAD-conforming descriptions to graph counterparts requires stateof-the-art technologies, data models and vocabularies. Our approach addresses this problem from two perspectives. The first concerns the data model and description vocabularies, as we adopt and build upon the CIDOC-CRM standard. The second is the choice of technologies to support a knowledge graph, including a graph database and an Object Graph Mapping library. The case study is the Portuguese National Archives, Torre do Tombo, and the overall goal is to build a CIDOC-CRM-compliant system for document description and retrieval, to be used by professionals and the public. The early stages described here include the design of the core data model for archival records represented as the ArchOnto ontology and its embodiment in the ArchGraph knowledge graph. The goal of a semantic archival information systems will be pursued in the migration of existing records to the richer representation and the development of applications supported on the graph.

show abstract

Involving Data Creators in an Ontology-Based Design Process for Metadata Models

Castro

Amorim

Gattelli

et al. 2017

View full text Add to dashboard Cite

Research data are the cornerstone of science and their current fast rate of production is disquieting researchers. Adequate research data management strongly depends on accurate metadata records that capture the production context of the datasets, thus enabling data interpretation and reuse. This chapter reports on the authors' experience in the development of the metadata models, formalized as ontologies, for several research domains, involving members from small research teams in the overall process. This process is instantiated with four case studies: vehicle simulation; hydrogen production; biological oceanography and social sciences. The authors also present a data description workflow that includes a research data management platform, named Dendro, where researchers can prepare their datasets for further deposit in external data repositories.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.