Ask Me Anything: A simple strategy for prompting language models

Arora, Simran; Narayan, Avanika; F., Chen, Mayee; Orr, Laurel Jeffers; Guha, Neel; Bhatia, Kush; Chami, Ines; Sala, F.; Ré, Christopher

doi:10.48550/arxiv.2210.02441

Cited by 13 publications

(20 citation statements)

References 18 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Crowdworkers have different viewpoints [6,24,83] and sourcing perspectives from expert crowdworkers [27] or from workers with various skill levels [102] can extend this variety. For LLMs, leveraging their stochastic nature (e.g., with a higher temperature parameter setting) or using different models or prompt variants (including role-based prompts) can source a variety of responses [12,117,143]. In some cases one can automatically calculate the optimal mix of LLMs for a task [100].…”

Section: Response Diversitymentioning

confidence: 99%

“…Simplifying tasks can increase quality [28,83,120,142]. Another strategy is to adapt the subtask to fit the worker's capabilities [12,23,109,120]. For example, researchers found that crowdworkers were better at generating predictive features than at estimating if a feature is predictive; so they adapted the workflow accordingly [25].…”

Section: Response Diversitymentioning

confidence: 99%

“…ideas. Despite their widespread adoption, LLMs suffer from quality deficits like hallucinations [40,106,111], brittleness to prompt changes [12,45,55,114,152], and user interventions limited to prompt changes [116,143]. These deficits are exacerbated for complex tasks, which require multiple steps in which errors can arise and propagate [39].…”

Section: Abstract 1 Introductionmentioning

confidence: 99%

See 2 more Smart Citations

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Grunde-McLaughlin

Krishna

Agrawala

2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

LLM chains enable complex tasks by decomposing work into a sequence of sub-tasks. Crowdsourcing workflows similarly decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space connects an LLM designer's objectives to strategies they can use to achieve those objectives, and tactics to implement each strategy. To explore how techniques from crowdsourcing may apply to chaining, we adapt crowdsourcing workflows to implement LLM chains across three case studies: creating a taxonomy, shortening text, and writing a short story. From the design space and our case studies, we identify which techniques transfer from crowdsourcing to LLM chaining and raise implications for future research and development.

show abstract

Section: Response Diversitymentioning

confidence: 99%

Section: Response Diversitymentioning

confidence: 99%

Section: Abstract 1 Introductionmentioning

confidence: 99%

See 1 more Smart Citation

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Grunde-McLaughlin

Krishna

Agrawala

2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…into the importance of prompt engineering, the development of frameworks and strategies for structuring prompts, and the evaluation of LLM performance on complex tasks. 5,6,[24][25][26][27][28][29] These studies highlight the significance of prompt engineering when working with models like GPT-4 and provide relevant context for using LLMs in advanced research, such as the present study on SRT. This work differs from these studies in that it specifically evaluates GPT-4, which is posited to possess emergent, foundational AGI behaviors.…”

Section: Related Workmentioning

confidence: 92%

Assessing GPT-4’s Role as a Co-Collaborator in Scientific Research: A Case Study Analyzing Einstein’s Special Theory of Relativity

Bryant

2023

Preprint

View full text Add to dashboard Cite

This study explores GPT-4 as a co-collaborator in scientific research, focusing on evaluating the consistency of Einstein’s Special Relativity Theory equations. GPT-4’s human-level performance makes it a valuable partner in complex research areas with limited access to human experts. Through in-depth conversations, GPT-4 confirmed an inconsistency in the SRT equations, challenging the theory’s validity. This inconsistency persisted except where x=ct, a conclusion reached after GPT-4 advised expanding the range over which the inconsistency was observed. Despite GPT-4’s biases, preference for preserving Einstein’s theory, and mathematical limitations, it contributed to an improved analytical approach and constraint expansion. This paper underscores the benefits and challenges of using GPT-4 in scientific research and highlights the need for caution regarding biases and limitations in large language models. Future research should aim to enhance these models' accuracy, trustworthiness, and objectivity in complex or controversial domains.

show abstract

“…SuperGLUE: We evaluate models on the SuperGLUE (Wang et al, 2019) with the parsing pipeline of (Arora et al, 2022). For all tasks except WIC, CB and BoolQ, we generate a response using greedy decoding, then check for the gold label.…”

Section: A3 Downstream Evaluationmentioning

confidence: 99%

Hyena Hierarchy: Towards Larger Convolutional Language Models

Poli¹,

Massaroli²,

Nguyen³

et al. 2023

Preprint

View full text Add to dashboard Cite

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on statespaces and other implicit and explicit methods, matching attention-based models. We set a new state-ofthe-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile), reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100× faster at sequence length 64K.

show abstract

Ask Me Anything: A simple strategy for prompting language models

Cited by 13 publications

References 18 publications

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Assessing GPT-4’s Role as a Co-Collaborator in Scientific Research: A Case Study Analyzing Einstein’s Special Theory of Relativity

Hyena Hierarchy: Towards Larger Convolutional Language Models

Contact Info

Product

Resources

About