Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Bohnet, Bernd; Trần, Vinh Cao; Verga, Pat; Aharoni, Roee; Andor, Daniel; Soares, Livio Baldini; Eisenstein, Jacob; Ganchev, Kuzman; Herzig, Jonathan; Huang, Kai; Kwiatkowski, Tom; Ma, Jing; Ni, Jianmo; Schuster, Tal; Cohen, William W.; Collins, Michael J.; Das, Dipanjan; Metzler, Donald; Petrov, Slav; Webster, Kellie

doi:10.48550/arxiv.2212.08037

Cited by 10 publications

(19 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Madaan et al present an iterative self-refinement algorithm that alternates between feedback and refinement. Additionally, LLMs are also used to evaluate attribution between generated answers and references [2,51].…”

Section: Related Work 21 Open-domain Question Answeringmentioning

confidence: 99%

“…Current ODQA methods follow two main paradigms in preparation for answering questions: (1) The retrieve-then-read paradigm retrieves pertinent evidence documents from an external corpus and generates an answer based on them [16,18]. Since retrieval models often rely on well-curated corpora like Wikipedia, they can provide highly factual and accurate information about the question; (2) The generate-then-read paradigm directly employs language models to generate virtual documents [49], diversifying the evidence sources and enhancing answer coverage for the question.…”

Section: Introductionmentioning

confidence: 99%

“…Recent works have showcased the exceptional capabilities of large language models (LLMs) across various tasks [4]. Specifically for ODQA, which requires integration of text generation [9,34,45,49], document ranking [9,25,26], and candidate evaluation [2,51], the multiple aspects of capabilities of LLMs into each module. Therefore, we aim to instruct LLMs to play three roles within our proposed unified framework: generators, rerankers, and evaluators.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

Sun,

Liu,

et al. 2024

Proceedings of the ACM Web Conference 2024

View full text Add to dashboard Cite

Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The retrieve-then-read paradigm retrieves pertinent documents from an external corpus; and (2) the generate-then-read paradigm employs large language models (LLMs) to generate relevant documents. However, neither can fully address multifaceted requirements for evidence. To this end, we propose LLMQA, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence. Since LLMs exhibit their excellent capabilities to accomplish various tasks, we instruct LLMs to play multiple roles as generators, rerankers, and evaluators within our framework, integrating them to collaborate in the ODQA process. Furthermore, we introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers. Extensive experimental results on widely used benchmarks (NQ, WebQ, and

show abstract

Section: Related Work 21 Open-domain Question Answeringmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

Sun,

Liu,

et al. 2024

Proceedings of the ACM Web Conference 2024

View full text Add to dashboard Cite

show abstract

“…Attribution and Fact Checking Our goal is closely related to works that check if LM-generated texts are faithful to a given source text (Bohnet et al, 2022;Honovich et al, 2022). This problem has been addressed via several approaches,…”

Section: Related Workmentioning

confidence: 99%

“…Modern language models (LMs) often generate inconsistent (Elazar et al, 2021), non-attributable (Rashkin et al, 2021;Bohnet et al, 2022;Liu et al, 2023a), or factually incorrect text (Tam et al, 2022;Devaraj et al, 2022;Maynez et al, 2020), thus negatively impacting the reliability of these models (Amodei et al, 2016;Hendrycks et al, 2021). This has prompted the community to develop methods that calibrate the confidence of model predictions to better align with their quality (Brundage et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

LM vs LM: Detecting Factual Errors via Cross Examination

Cohen,

Hamri,

Geva

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

A prominent weakness of modern language models (LMs) is their tendency to generate factually incorrect text, which hinders their usability. A natural question is whether such factual errors can be detected automatically. Inspired by truth-seeking mechanisms in law, we propose a factuality evaluation framework for LMs that is based on cross-examination. Our key idea is that an incorrect claim is likely to result in inconsistency with other claims that the model generates. To discover such inconsistencies, we facilitate a multi-turn interaction between the LM that generated the claim and another LM (acting as an examiner) which introduces questions to discover inconsistencies. We empirically evaluate our method on factual claims made by multiple recent LMs on four benchmarks, finding that it outperforms existing methods and baselines, often by a large gap. Our results demonstrate the potential of using interacting LMs to capture factual errors.

show abstract