Survey of Hallucination in Natural Language Generation

Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale

doi:10.1145/3571730

Cited by 590 publications

(370 citation statements)

References 114 publications

Supporting

Mentioning

281

Contrasting

Unclassified

Order By: Relevance

“…Prior work has shown that neural language models suffer from contradictions and inconsistency as well as a tendency to “hallucinate,” or generate factually incorrect information ( 29 ). In the complex domain of Diplomacy , dialogue models exhibit both these problems and other more subtle mistakes, such as deviations from the intents used to control the message or blunders in the strategic content of the message.…”

Section: Methodsmentioning

confidence: 99%

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Bakhtin¹,

Brown²,

Dinan³

et al. 2022

Science

View full text Add to dashboard Cite

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy , a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

show abstract

Section: Methodsmentioning

confidence: 99%

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Bakhtin¹,

Brown²,

Dinan³

et al. 2022

Science

View full text Add to dashboard Cite

show abstract

“…Out of all explanations, including incorrect ones, explanations included at least one hallucinated reference or authority in approximately 37% of the time. Research is ongoing on the optimal degree of hallucination and techniques for mitigating unwanted hallucination [38], and we will continue to explore these questions and applications in future work. For text-davinci-003, the average is reported across all runs; for other models, a subset of representative prompts and parameters were included.…”

Section: Assessmentmentioning

confidence: 99%

Gpt as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities

Bommarito¹,

Bommarito

Katz³

et al. 2023

SSRN Journal

View full text Add to dashboard Cite

“…Although these natural language processing techniques have the potential to greatly improve the efficiency and effectiveness of software development, there are concerns regarding their use. One problem with large language models is that they can sometimes "hallucinate" [24]. A recent study found that a state-of-the-art model was more likely to generate code containing a vulnerability if the query asked for code without that vulnerability [25].…”

Section: B Using Artificial Intelligence To Generate Codementioning

confidence: 99%

Navigating Complexity in Software Engineering: A Prototype for Comparing GPT-n Solutions

Treude

2023

Preprint

View full text Add to dashboard Cite

Navigating the diverse solution spaces of non-trivial software engineering tasks requires a combination of technical knowledge, problem-solving skills, and creativity. With multiple possible solutions available, each with its own set of trade-offs, it is essential for programmers to evaluate the various options and select the one that best suits the specific requirements and constraints of a project. Whether it is choosing from a range of libraries, weighing the pros and cons of different architecture and design solutions, or finding unique ways to fulfill user requirements, the ability to think creatively is crucial for making informed decisions that will result in efficient and effective software. However, the interfaces of current chatbot tools for programmers, such as OpenAI's ChatGPT or GitHub Copilot, are optimized for presenting a single solution, even for complex queries. While other solutions can be requested, they are not displayed by default and are not intuitive to access. In this paper, we present our work-in-progress prototype "GPTCOMPARE", which allows programmers to visually compare multiple source code solutions generated by GPT-n models for the same programmingrelated query by highlighting their similarities and differences.

show abstract

Survey of Hallucination in Natural Language Generation

Cited by 590 publications

References 114 publications

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Gpt as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities

Navigating Complexity in Software Engineering: A Prototype for Comparing GPT-n Solutions

Contact Info

Product

Resources

About