A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Bang, Yejin; Cahyawijaya, Samuel; Lee, Nayeon; Dai, Wujiao; Su, Dan; Wilie, Bryan; Lovenia, Holy; Ji, Ziwei; Yu, Tiezheng; Willy, Chung,; Do, Quyet V.; Xu, Yan; Fung, Pascale

doi:10.48550/arxiv.2302.04023

Cited by 109 publications

(134 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nonetheless, ChatGPT performs poorly on low-resource languages and faces extra challenges handling distant language translation (i.e., English-German translation is considered to be less "distant", compared to English-Hindi translation). A later study [57] confirms that ChatGPT struggles with low-resource languages, although the authors observe that ChatGPT does better in understanding non-Latin scripts than generating them.…”

Section: Chatgpt: Present and Futurementioning

confidence: 82%

See 1 more Smart Citation

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Dai¹,

Liu²,

Liao³

et al. 2023

Preprint

View full text Add to dashboard Cite

Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely-used strategy to mitigate such challenges is to perform data augmentation on the training data to better capture the data invariance and increase the sample size. However, current text data augmentation methods either can not ensure the correct labeling of the generated data (lacking faithfulness) or can not ensure sufficient diversity in the generated data (lacking completeness), or both. Inspired by the recent success of large language models, especially the development of ChatGPT, which demonstrated improved language comprehension abilities, in this work, we propose a text data augmentation approach based on ChatGPT (named ChatAug). ChatGPT is trained on data with unparalleled linguistic richness and employs a reinforcement training process with large-scale human feedback, which endows the model with affinity to the naturalness of human language. Our text data augmentation approach ChatAug rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed ChatAug approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.

show abstract

Section: Chatgpt: Present and Futurementioning

confidence: 82%

“…In addition, it is also possible to use the purely textbased ChatGPT to interact with multimodal data. A group of researchers [57] use HTML Canvas and Python Turtle graphics as media for text-to-image generation. ChatGPT can faithfully generate HTML and Python code, which can be then used to generate desired images.…”

Section: Chatgpt: Present and Futurementioning

confidence: 99%

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Dai¹,

Liu²,

Liao³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…There is a widespread belief among experts that the field of natural language processing (NLP) is currently experiencing a paradigm shift [46] as a result of the introduction of LLM (Large Language Models) [47], with chatGPT [48] being the leading example of this new technology. With this new technology, many tasks that previously relied on fine-tuning pre-trained models can now be achieved through prompt engineering, which involves identifying the appropriate instructions to direct the language model (LLM) for specific tasks.…”

Section: Rethinking Nl2cmd In the Age Of Chatgptmentioning

confidence: 99%

A Transformer-based Approach for Translating Natural Language to Bash Commands

Teng

White

et al. 2021

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

View full text Add to dashboard Cite

Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands.This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

show abstract

“…It is of imminent importance to evaluate the potential risks behind ChatGPT given its increasing worldwide popularity in diverse applications. While previous efforts have evaluated various aspects of ChatGPT in law (Choi et al, 2023), ethics (Shen et al, 2023), education (Khalil and Er, 2023), and reasoning (Bang et al, 2023), we focus on its robustness (Bengio et al, 2021), which, to our best knowledge, has not been thoroughly evaluated yet. Robustness refers to the ability to withstand disturbances or external factors that may cause it to malfunction or provide inaccurate results.…”

Section: Introductionmentioning

confidence: 99%

“…Previous efforts evaluate ChatGPT in different aspects (van Dis et al, 2023). Bang et al (2023) proposes a multi-task, multi-modal, and multilingual evaluation of ChatGPT on different tasks. They showed that ChatGPT performs reasonably well on most tasks, while it does not bring great performance on low-resource tasks.…”

Section: Introductionmentioning

confidence: 99%

On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

Wang¹,

Hu²,

Hou³

et al. 2023

Preprint

View full text Add to dashboard Cite

ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of Chat-GPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct a thorough evaluation of the robustness of ChatGPT from the adversarial and out-of-distribution (OOD) perspective. To do so, we employ the AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart review and DDXPlus medical diagnosis datasets for OOD evaluation. We select several popular foundation models as baselines. Results show that ChatGPT shows consistent advantages on most adversarial and OOD classification and translation tasks. However, the absolute performance is far from perfection, which suggests that adversarial and OOD robustness remains a significant threat to foundation models. Moreover, ChatGPT shows astounding performance in understanding dialogue-related texts and we find that it tends to provide informal suggestions for medical tasks instead of definitive answers. Finally, we present in-depth discussions of possible research directions.

show abstract

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Cited by 109 publications

References 0 publications

AugGPT: Leveraging ChatGPT for Text Data Augmentation

AugGPT: Leveraging ChatGPT for Text Data Augmentation

A Transformer-based Approach for Translating Natural Language to Bash Commands

On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

Contact Info

Product

Resources

About