RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Akyürek, Afra Feyza; Ekin, Akyürek,; Kalyan, Ashwin; Clark, Peter E.; Wijaya, Derry; Tandon, Niket

doi:10.18653/v1/2023.acl-long.427

Cited by 6 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CRITIC (Gou et al, 2023) proposes the use of a suite of specialized tools for a variety of tasks such as code interpreters, calculators, or search engines to generate critics for the LLM's generated output. Moreover, approaches such as REFINER (Paul et al, 2023), CodeRL and RL4F (Akyurek et al, 2023) propose to train a specialized critic to provide feedback to the generator model.…”

Section: Related Workmentioning

confidence: 99%

“…Various methods have been proposed to tackle this problem (Pan et al, 2023). From training-time correction Li et al, 2019;Jauregi Unanue et al, 2021;Zelikman et al, 2022;Huang et al, 2022) to post output generation refinement (Madaan et al, 2023;Shinn et al, 2023;Zhang et al, 2023;Pan et al, 2023;Yu et al, 2023;Gou et al, 2023;Paul et al, 2023;Akyurek et al, 2023), these methods have shown the impact that iterative self-refinement and proper feedback can have on the performance of LLMs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Self-powered Vs. high speed ZnO-based photodetectors

Mousavi

Sajad

2022

Materials Research Bulletin

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Self-powered Vs. high speed ZnO-based photodetectors

Mousavi

Sajad

2022

Materials Research Bulletin

View full text Add to dashboard Cite

Interpretable Enterprise Credit Rating via Reinforcement Learning

Wang,

Guo

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Evaluating language models for mathematics through interactions

Collins,

Jiang,

Frieder

et al. 2024

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analyzing MathConverse, we derive a taxonomy of human query behaviors and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, among other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by experienced mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, may constitute better assistants. Humans should inspect LLM output carefully given their current shortcomings and potential for surprising fallibility.

show abstract

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Cited by 6 publications

References 24 publications

Self-powered Vs. high speed ZnO-based photodetectors

Self-powered Vs. high speed ZnO-based photodetectors

Interpretable Enterprise Credit Rating via Reinforcement Learning

Evaluating language models for mathematics through interactions

Contact Info

Product

Resources

About