We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DB-Pedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.
No abstract
We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.
The goal of stance detection is to determine the viewpoint expressed in a piece of text towards a target. These viewpoints or contexts are often expressed in many different languages depending on the user and the platform, which can be a local news outlet, a social media platform, a news forum, etc. Most research on stance detection, however, has been limited to working with a single language and on a few limited targets, with little work on cross-lingual stance detection. Moreover, non-English sources of labelled data are often scarce and present additional challenges. Recently, large multilingual language models have substantially improved the performance on many non-English tasks, especially such with a limited number of examples. This highlights the importance of model pre-training and its ability to learn from few examples. In this paper, we present the most comprehensive study of cross-lingual stance detection to date: we experiment with 15 diverse datasets in 12 languages from 6 language families, and with 6 low-resource evaluation settings each. For our experiments, we build on pattern-exploiting training (PET), proposing the addition of a novel label encoder to simplify the verbalisation procedure. We further propose sentiment-based generation of stance data for pre-training, which shows sizeable improvement of more than 6% F1 absolute in few-shot learning settings compared to several strong baselines.
Recent years have seen growing interest in conversational agents, such as chatbots, which are a very good fit for automated customer support because the domain in which they need to operate is narrow. This interest was in part inspired by recent advances in neural machine translation, esp. the rise of sequence-to-sequence (seq2seq) and attention-based models such as the Transformer, which have been applied to various other tasks and have opened new research directions in question answering, chatbots, and conversational systems. Still, in many cases, it might be feasible and even preferable to use simple information retrieval techniques. Thus, here we compare three different models: (i) a retrieval model, (ii) a sequence-to-sequence model with attention, and (iii) Transformer. Our experiments with the Twitter Customer Support Dataset, which contains over two million posts from customer support services of twenty major brands, show that the seq2seq model outperforms the other two in terms of semantics and word overlap.
Understanding attitudes expressed in texts, also known as stance detection, plays an important role in systems for detecting false information online, be it misinformation (unintentionally false) or disinformation (intentionally false information). Stance detection has been framed in different ways, including (a) as a component of fact-checking, rumour detection, and detecting previously fact-checked claims, or (b) as a task in its own right. While there have been prior efforts to contrast stance detection with other related tasks such as argumentation mining and sentiment analysis, there is no existing survey on examining the relationship between stance detection and mis-and disinformation detection. Here, we aim to bridge this gap by reviewing and analysing existing work in this area, with mis-and disinformation in focus, and discussing lessons learnt and future challenges.
We propose Eχαµs -a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. We collected more than 24,000 highquality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others.Eχαµs offers a fine-grained evaluation framework across multiple languages and subjects, which allows precise analysis and comparison of various models. We perform various experiments with existing top-performing multilingual pre-trained models and we show that Eχαµs offers multiple challenges that require multilingual knowledge and reasoning in multiple domains. We hope that Eχαµs will enable researchers to explore challenging reasoning and knowledge transfer methods and pretrained models for school question answering in various languages which was not possible before. The data, code, pre-trained models, and evaluation are available at http:// github.com/mhardalov/exams-qa.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.