Querying Across Genres for Medical Claims in News

Zuo, Chaoyuan; Acharya, Narayan; Banerjee, Ritwik

doi:10.18653/v1/2020.emnlp-main.139

Cited by 5 publications

(7 citation statements)

References 24 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Like on the MultiNLI, we also report the Precision, Recall and F1. • Cross-genre-IR 3 [49] is for the across medical genres querying task, where each claim (i.e., he news headline) is associated with at least one peer-reviewed research publication supporting it.…”

Section: Datasets and Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

Wang¹,

Chen²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pre-trained language models (PLMs) have demonstrated their effectiveness for a broad range of information retrieval and natural language processing tasks. As the core part of PLMs, multi-head self-attention is appealing for its ability to jointly attend to information from different positions. However, researchers have found that PLMs always exhibit fixed attention patterns regardless of the input (e.g., excessively paying attention to '[CLS]' or '[SEP]'), which we argue might neglect important information in the other positions. In this work, we propose a simple yet effective attention guiding mechanism to improve the performance of PLMs through encouraging the attention towards the established goals. Specifically, we propose two kinds of attention guiding methods, i.e., the attention map discrimination guiding (MDG) and the attention pattern decorrelation guiding (PDG). The former definitely encourages the diversity among multiple self-attention heads to jointly attend to information from different representation subspaces, while the latter encourages self-attention to attend to as many different positions of the input as possible. We conduct experiments with multiple general pre-trained models (i.e., BERT, ALBERT, and Roberta) and domain-specific pre-trained models (i.e., BioBERT, Clinical-BERT, BlueBert, and SciBERT) on three benchmark datasets (i.e., MultiNLI, MedNLI, and Cross-genre-IR). Extensive experimental results demonstrate that our proposed MDG and PDG bring stable performance improvements on all datasets with high efficiency and low cost. CCS CONCEPTS• Information systems → Clustering and classification; Content analysis and feature selection; • Computing methodologies → Contrastive learning.

show abstract

Section: Datasets and Evaluationmentioning

confidence: 99%

“…• SOTA. We also compare with state-of-the-art methods on each dataset, which are based on Roberta, BlueBERT and BERT, to the best of our knowledge [24,33,49].…”

Section: Plms For Comparisonmentioning

confidence: 99%

Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

Wang¹,

Chen²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…2 PubHealth Scientists find clues to why bingedrinking causes binge-eating. 3 Zuo et al (2020) Scientists discover gene mutation involved in paraplegia and epilepsy 4 COVID-Fact…”

Section: Id Source Claimmentioning

confidence: 99%

“…Unfortunately, systems trained on datasets from other domains might not be reusable: The datasets that underly existing pretrained models work with atomic, edited or summarized claims (e.g., from datasets like SCIFACT, Wadden et al, 2020), cover claims that have been selected to be well-formed (COVID-Fact, Saakyan et al, 2021), or contain editorial content such as news headlines (Zuo et al, 2020). Examples 1-5 in Table 1 convey complex biomedical processes, they are relatively short and coherently worded.…”

Section: Id Source Claimmentioning

confidence: 99%

“…Finally, some studies explore settings in which the claim and evidence texts originate from different genres. Zuo et al (2020) investigate retrieving scientific evidence for biomedical claims in news texts. Sarrouti et al (2021) check user-generated, online claims against scientific articles and Saakyan et al (2021) explore this task for COVID-19-related claims from Reddit.…”

Section: Biomedical and Scientific Fact-checkingmentioning

confidence: 99%

See 1 more Smart Citation

Entity-based Claim Representation Improves Fact-Checking of Medical Content in Tweets

Wührl¹,

Klinger²

2022

Preprint

View full text Add to dashboard Cite

False medical information on social media poses harm to people's health. While the need for biomedical fact-checking has been recognized in recent years, user-generated medical content has received comparably little attention. At the same time, models for other text genres might not be reusable, because the claims they have been trained with are substantially different. For instance, claims in the SCIFACT dataset are short and focused: "Side effects associated with antidepressants increases risk of stroke". In contrast, social media holds naturally-occurring claims, often embedded in additional context: "'If you take antidepressants like SSRIs, you could be at risk of a condition called serotonin syndrome' Serotonin syndrome nearly killed me in 2010. Had symptoms of stroke and seizure." This showcases the mismatch between realworld medical claims and the input that existing fact-checking systems expect. To make user-generated content checkable by existing models, we propose to reformulate the socialmedia input in such a way that the resulting claim mimics the claim characteristics in established datasets. To accomplish this, our method condenses the claim with the help of relational entity information and either compiles the claim out of an entity-relation-entity triple or extracts the shortest phrase that contains these elements. We show that the reformulated input improves the performance of various fact-checking models as opposed to checking the tweet text in its entirety.

show abstract