2023
DOI: 10.48550/arxiv.2302.04023
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
95
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 109 publications
(134 citation statements)
references
References 0 publications
1
95
0
Order By: Relevance
“…Nonetheless, ChatGPT performs poorly on low-resource languages and faces extra challenges handling distant language translation (i.e., English-German translation is considered to be less "distant", compared to English-Hindi translation). A later study [57] confirms that ChatGPT struggles with low-resource languages, although the authors observe that ChatGPT does better in understanding non-Latin scripts than generating them.…”
Section: Chatgpt: Present and Futurementioning
confidence: 82%
See 1 more Smart Citation
“…Nonetheless, ChatGPT performs poorly on low-resource languages and faces extra challenges handling distant language translation (i.e., English-German translation is considered to be less "distant", compared to English-Hindi translation). A later study [57] confirms that ChatGPT struggles with low-resource languages, although the authors observe that ChatGPT does better in understanding non-Latin scripts than generating them.…”
Section: Chatgpt: Present and Futurementioning
confidence: 82%
“…In addition, it is also possible to use the purely textbased ChatGPT to interact with multimodal data. A group of researchers [57] use HTML Canvas and Python Turtle graphics as media for text-to-image generation. ChatGPT can faithfully generate HTML and Python code, which can be then used to generate desired images.…”
Section: Chatgpt: Present and Futurementioning
confidence: 99%
“…There is a widespread belief among experts that the field of natural language processing (NLP) is currently experiencing a paradigm shift [46] as a result of the introduction of LLM (Large Language Models) [47], with chatGPT [48] being the leading example of this new technology. With this new technology, many tasks that previously relied on fine-tuning pre-trained models can now be achieved through prompt engineering, which involves identifying the appropriate instructions to direct the language model (LLM) for specific tasks.…”
Section: Rethinking Nl2cmd In the Age Of Chatgptmentioning
confidence: 99%
“…It is of imminent importance to evaluate the potential risks behind ChatGPT given its increasing worldwide popularity in diverse applications. While previous efforts have evaluated various aspects of ChatGPT in law (Choi et al, 2023), ethics (Shen et al, 2023), education (Khalil and Er, 2023), and reasoning (Bang et al, 2023), we focus on its robustness (Bengio et al, 2021), which, to our best knowledge, has not been thoroughly evaluated yet. Robustness refers to the ability to withstand disturbances or external factors that may cause it to malfunction or provide inaccurate results.…”
Section: Introductionmentioning
confidence: 99%
“…Previous efforts evaluate ChatGPT in different aspects (van Dis et al, 2023). Bang et al (2023) proposes a multi-task, multi-modal, and multilingual evaluation of ChatGPT on different tasks. They showed that ChatGPT performs reasonably well on most tasks, while it does not bring great performance on low-resource tasks.…”
Section: Introductionmentioning
confidence: 99%