Yejin Bang scite author profile

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

show abstract

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Bang¹,

Cahyawijaya²,

Lee³

et al. 2023

Preprint

View full text Add to dashboard Cite

Towards Few-shot Fact-Checking via Perplexity

Lee¹,

Bang²,

Madotto³

et al. 2021

View full text Add to dashboard Cite

Few-shot learning has drawn researchers' attention to overcome the problem of data scarcity. Recently, large pre-trained language models have shown great performance in fewshot learning for various downstream tasks, such as question answering and machine translation. Nevertheless, little exploration has been made to achieve few-shot learning for the fact-checking task. However, fact-checking is an important problem, especially when the amount of information online is growing exponentially every day. In this paper, we propose a new way of utilizing the powerful transfer learning ability of a language model via a perplexity score. The most notable strength of our methodology lies in its capability in fewshot learning. With only two training samples, our methodology can already outperform the Major Class baseline by more than an absolute 10% on the F1-Macro metric across multiple datasets. Through experiments, we empirically verify the plausibility of the rather surprising usage of the perplexity score in the context of fact-checking and highlight the strength of our few-shot methodology by comparing it to strong fine-tuning-based baseline models. Moreover, we construct and publicly release two new fact-checking datasets related to COVID-19.

show abstract

Living near nuclear power plants and thyroid cancer risk: A systematic review and meta-analysis

Bang

Lee

2016

Environment International

View full text Add to dashboard Cite

Survey of Hallucination in Natural Language Generation

Ji¹,

Lee²,

Frieske³

et al. 2022

Preprint

View full text Add to dashboard Cite

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent natural language generation, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended texts, which degrades the system performance and fail to meet user expectations in many real-world scenarios. In order to address this issue, there have been studies in measuring and mitigating hallucinated texts. However there has not been a comprehensive review of the state-of-the-art in hallucination detection and mitigation.In this survey, we provide a broad overview of the research progress and challenges in the hallucination problem of NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress for hallucinations in a large set of downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

show abstract

XPersona: Evaluating Multilingual Personalized Chatbot

Lin¹,

Liu²,

Winata³

et al. 2020

Preprint

View full text Add to dashboard Cite

Personalized dialogue systems are an essential step toward better human-machine interaction.Existing personalized dialogue agents rely on properly designed conversational datasets, which are mostly monolingual (e.g., English), which greatly limits the usage of conversational agents in other languages. In this paper, we propose a multi-lingual extension of Persona-Chat (Zhang et al., 2018), namely XPersona. Our dataset includes persona conversations in six different languages other than English for building and evaluating multilingual personalized agents. We experiment with both multilingual and cross-lingual trained baselines, and evaluate them against monolingual and translation-pipeline models using both automatic and human evaluation. Experimental results show that the multilingual trained models outperform the translationpipeline and that they are on par with the monolingual models, with the advantage of having a single model across multiple languages. On the other hand, the state-of-theart cross-lingual trained models achieve inferior performance to the other models, showing that cross-lingual conversation modeling is a challenging task. We hope that our dataset and baselines 1 will accelerate research in multilingual dialogue systems. * * Equal contributions. Listing order is random 1 Datasets and all the baselines will be released

show abstract

XPersona: Evaluating Multilingual Personalized Chatbot

Lin¹,

Liu²,

Winata³

et al. 2021

View full text Add to dashboard Cite

Personalized dialogue systems are an essential step toward better human-machine interaction. Existing personalized dialogue agents rely on properly designed conversational datasets, which are mostly monolingual (e.g., English), which greatly limits the usage of conversational agents in other languages. In this paper, we propose a multi-lingual extension of Persona-Chat (Zhang et al., 2018), namely XPersona. Our dataset includes persona conversations in six different languages other than English for evaluating multilingual personalized agents. We experiment with both multilingual and cross-lingual trained baselines, and evaluate them against monolingual and translation-pipeline models using both automatic and human evaluation. Experimental results show that the multilingual trained models outperform the translation-pipeline and that they are on par with the monolingual models, with the advantage of having a single model across multiple languages. On the other hand, the state-of-the-art cross-lingual trained models achieve inferior performance to the other models, showing that cross-lingual conversation modeling is a challenging task. We hope that our dataset and baselines 1 will accelerate research in multilingual dialogue systems. * * Equal contributions. Listing order is random 1 The code and dataset are available in https:// github.com/HLTCHKUST/Xpersona.

show abstract

Model Generalization on COVID-19 Fake News Detection

Bang

Ishii

Cahyawijaya

et al. 2021

Preprint

View full text Add to dashboard Cite

Amid the pandemic COVID-19, the world is facing unprecedented infodemic with the proliferation of both fake and real information. Considering the problematic consequences that the COVID-19 fake-news have brought, the scientific community has put effort to tackle it. To contribute to this fight against the infodemic, we aim to achieve a robust model for the COVID-19 fake-news detection task proposed at CONSTRAINT 2021 (FakeNews-19) by taking two separate approaches: 1) fine-tuning transformers based language models with robust loss functions and 2) removing harmful training instances through influence calculation. We further evaluate the robustness of our models by evaluating on different COVID-19 misinformation test set (Tweets-19) to understand model generalization ability. With the first approach, we achieve 98.13% for weighted F1 score (W-F1) for the shared task, whereas 38.18% W-F1 on the Tweets-19 highest. On the contrary, by performing influence data cleansing, our model with 99% cleansing percentage can achieve 54.33% W-F1 score on Tweets-19 with a trade-off. By evaluating our models on two COVID-19 fake-news test sets, we suggest the importance of model generalization ability in this task to step forward to tackle the COVID-19 fake-news problem in online social media platforms.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yejin Bang

Survey of Hallucination in Natural Language Generation

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Towards Few-shot Fact-Checking via Perplexity

Living near nuclear power plants and thyroid cancer risk: A systematic review and meta-analysis

Survey of Hallucination in Natural Language Generation

XPersona: Evaluating Multilingual Personalized Chatbot

XPersona: Evaluating Multilingual Personalized Chatbot

Model Generalization on COVID-19 Fake News Detection

Contact Info

Product

Resources

About