The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroidbased method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks.
The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.
Conversational Recommender Systems (CoRSs) implement a paradigm that allows users to interact in natural language with the system for defining their preferences and discovering items that best fit their needs. CoRSs can be straightforwardly implemented as chatbots that, nowadays, are becoming more and more popular for several applications, such as customer care, health care, and medical diagnoses. Chatbots implement an interaction based on natural language, buttons, or both. The implementation of a chatbot is a challenging task since it requires knowledge about natural language processing and human-computer interaction. A CoRS might be particularly useful in the music domain since music is generally enjoyed in contexts when a standard interface cannot be exploited (driving, doing homeworks, running). However, there is no work in the literature that analytically compares different interaction modes for a conversational music recommender system. In this paper, we focus on the design and implementation of a CoRS for the music domain. Our CoRS consists of different components. The system implements content-based recommendation, critiquing and adaptive strategies, as well as explanation facilities. The main innovative contribution is that the user can interact through different interaction modes: natural language, buttons, and mixed. Due to the lack of available datasets for testing CoRSs, we carried out an in vivo experimental evaluation with the goal of investigating the impact of the different interaction modes on the recommendation accuracy and on the cost of interaction for the final user. The experiment involved 110 people, and 54 completed the whole process. The analysis of the results shows that the best interaction mode is based on a mixed strategy that combines buttons and natural language. In addition, the results allow to clearly understand which are the steps in the dialog that are particularly strenuous for the user.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.