Momchil Hardalov scite author profile

We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DB-Pedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.

show abstract

Cross-Domain Label-Adaptive Stance Detection

Hardalov¹,

Arora²,

Nakov³

et al. 2021

View full text Add to dashboard Cite

SUper Team at SemEval-2016 Task 3: Building a Feature-Rich System for Community Question Answering

Mihaylova

Gencheva

Boyanov³

et al. 2016

View full text Add to dashboard Cite

We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.

show abstract

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-training

Hardalov

Arora

Nakov

et al. 2022

AAAI

View full text Add to dashboard Cite

The goal of stance detection is to determine the viewpoint expressed in a piece of text towards a target. These viewpoints or contexts are often expressed in many different languages depending on the user and the platform, which can be a local news outlet, a social media platform, a news forum, etc. Most research on stance detection, however, has been limited to working with a single language and on a few limited targets, with little work on cross-lingual stance detection. Moreover, non-English sources of labelled data are often scarce and present additional challenges. Recently, large multilingual language models have substantially improved the performance on many non-English tasks, especially such with a limited number of examples. This highlights the importance of model pre-training and its ability to learn from few examples. In this paper, we present the most comprehensive study of cross-lingual stance detection to date: we experiment with 15 diverse datasets in 12 languages from 6 language families, and with 6 low-resource evaluation settings each. For our experiments, we build on pattern-exploiting training (PET), proposing the addition of a novel label encoder to simplify the verbalisation procedure. We further propose sentiment-based generation of stance data for pre-training, which shows sizeable improvement of more than 6% F1 absolute in few-shot learning settings compared to several strong baselines.

show abstract

Towards Automated Customer Support

Hardalov

Koychev

Nakov

2018

View full text Add to dashboard Cite

Recent years have seen growing interest in conversational agents, such as chatbots, which are a very good fit for automated customer support because the domain in which they need to operate is narrow. This interest was in part inspired by recent advances in neural machine translation, esp. the rise of sequence-to-sequence (seq2seq) and attention-based models such as the Transformer, which have been applied to various other tasks and have opened new research directions in question answering, chatbots, and conversational systems. Still, in many cases, it might be feasible and even preferable to use simple information retrieval techniques. Thus, here we compare three different models: (i) a retrieval model, (ii) a sequence-to-sequence model with attention, and (iii) Transformer. Our experiments with the Twitter Customer Support Dataset, which contains over two million posts from customer support services of twenty major brands, show that the seq2seq model outperforms the other two in terms of semantics and word overlap.

show abstract

Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection

Hardalov¹,

Koychev²,

Nakov³

2020

Preprint

View full text Add to dashboard Cite

A Survey on Stance Detection for Mis- and Disinformation Identification

Hardalov¹,

Arora²,

Nakov³

et al. 2022

View full text Add to dashboard Cite

Understanding attitudes expressed in texts, also known as stance detection, plays an important role in systems for detecting false information online, be it misinformation (unintentionally false) or disinformation (intentionally false information). Stance detection has been framed in different ways, including (a) as a component of fact-checking, rumour detection, and detecting previously fact-checked claims, or (b) as a task in its own right. While there have been prior efforts to contrast stance detection with other related tasks such as argumentation mining and sentiment analysis, there is no existing survey on examining the relationship between stance detection and mis-and disinformation detection. Here, we aim to bridge this gap by reviewing and analysing existing work in this area, with mis-and disinformation in focus, and discussing lessons learnt and future challenges.

show abstract

EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering

Hardalov¹,

Mihaylov²,

Zlatkova³

et al. 2020

View full text Add to dashboard Cite

We propose Eχαµs -a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. We collected more than 24,000 highquality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others.Eχαµs offers a fine-grained evaluation framework across multiple languages and subjects, which allows precise analysis and comparison of various models. We perform various experiments with existing top-performing multilingual pre-trained models and we show that Eχαµs offers multiple challenges that require multilingual knowledge and reasoning in multiple domains. We hope that Eχαµs will enable researchers to explore challenging reasoning and knowledge transfer methods and pretrained models for school question answering in various languages which was not possible before. The data, code, pre-trained models, and evaluation are available at http:// github.com/mhardalov/exams-qa.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.