The damaging effects of hate speech in social media are evident during the last few years, and several organizations, researchers and the social media platforms themselves have tried to harness them without great success. Recently, following the advent of deep learning, several novel approaches appeared in the field of hate speech detection. However, it is apparent that such approaches depend on large-scale datasets in order to exhibit competitive performance. In this paper, we present a novel, publicly available collection of datasets in five different languages, that consists of tweets referring to journalism-related accounts, including high-quality human annotations for hate speech and personal attack. To build the datasets we follow a concise annotation strategy and employ an active learning approach. Additionally, we present a number of state-of-the-art deep learning architectures for hate speech detection and use these datasets to train and evaluate them. Finally, we propose an ensemble model that outperforms all individual models.
This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an effort for resolving this issue. Finally, the evaluation results are discussed.
The success of deep learning (DL) in various areas, such as computer vision, fueled the interest in several novel DLenabled applications, such as financial trading, which could potentially surpass the previously used approaches. Indeed, there has been a plethora of DL-based trading methods proposed in recent years. Despite the success of these methods, they typically rely on a very restricted set of information, usually employing only price-related information. As a result, they ignore sentiment-related information, which can have a profound impact and be a strong predictor of various assets, such as cryptocurencies. The contribution of this paper is multifold. First, we examine whether the use of sentiment information, as extracted by various online sources, including news articles, is beneficial when training DL agents for trading. Then, given the difficulty of training reliable sentiment extractors for financial applications, we evaluate the impact of using different DL models as sentiment extractors, as well as employ an unsupervised training pipeline for further improving their performance. Finally, we propose an effective multisource sentiment fusion approach that can improve the performance over the rest of the evaluated approaches. The conducted experiments have been performed using several different configurations and models, ranging from multilayer perceptrons (MLPs) to convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to provide a reliable evaluation of sentiment-aware DL-based trading strategies providing evidence that sentiment information might be a stronger predictor compared to the information provided by the actual price time series for Bitcoin.
Purpose
This paper aims to attempt to provide an overview of the copyright legal framework for audiovisual resources in Europe and Greece, how Audiovisual (AV) content is currently licensed by Greek providers and how licenses or copyright exceptions enable its reuse. The motivation for this work was the development of an aggregation service for audiovisual resources in Greece, the Open AudioVisual Archives (OAVA) platform.
Design/methodology/approach
Copyright licenses and exceptions in the European Union and in Greek Legislation have been thoroughly reviewed along with the reuse of content, based on the terms of Fair Use, Rights Statements and Creative Commons. Licensing issues for the most well-known aggregation services, such as Europeana, Digital Public Library of America, Trove, Digital New Zealand and the National Digital Library of India, have also been studied and considered. Audiovisual content providers in Greece have been recorded, and their licensing preferences have been analyzed. Pearson’s chi-square test was applied to test the relationship between the provider’s type, resources’ genre and licenses used.
Findings
Despite the abundance of copyright legislation in the European Union and in Greece, audiovisual content providers in Greece seem to ignore it or find it difficult to choose the right license. More than half of them choose to publish their resources on popular audiovisual platforms using the default licensing option provided. Creative Commons licenses are preferred for audiovisual content that falls into the following categories: open courses (almost exclusively) and interviews and digital collection/research projects (about half of the content).
Originality/value
This paper examines audiovisual content aggregation, in the EU and Greece, from a legal point of view. To the best of the authors’ knowledge, it is the first attempt to record and analyze the licensing preferences of Greek AV content providers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.