Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2023
DOI: 10.18653/v1/2023.acl-long.352
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Context Windows for Large Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…While some formal techniques such as Explicit instruction (providing a clear direction to the LLM to do something) [187], System Specific Instruction (asking a question from the LLM to answer), Formatting with an example (providing a sample question and its answer and asking the LLM to provide an answer in the same manner), Control tokens (use special keywords in the prompt to help the LLM provide an answer while considering special provided criteria) [188] and Interaction and iteration/chaining (interact with model iteratively to reach to a good answer by fine-tuning on each reply) have been presented [79].…”
Section: B Generative Ai Design Cyclementioning
confidence: 99%
“…While some formal techniques such as Explicit instruction (providing a clear direction to the LLM to do something) [187], System Specific Instruction (asking a question from the LLM to answer), Formatting with an example (providing a sample question and its answer and asking the LLM to provide an answer in the same manner), Control tokens (use special keywords in the prompt to help the LLM provide an answer while considering special provided criteria) [188] and Interaction and iteration/chaining (interact with model iteratively to reach to a good answer by fine-tuning on each reply) have been presented [79].…”
Section: B Generative Ai Design Cyclementioning
confidence: 99%
“…However, the study also found that GPT-4 is less proficient in tasks that require complex reasoning or specific domain knowledge, highlighting the limitations of these models [24]. Recent research has addressed various limitations of large language models, including the hand-crafting of task-specific demonstrations [25], the evaluation of code synthesis [26], the cost barrier associated with large models [27], the evaluation protocol for conversational recommendation systems [28], and the context window restriction for off-the-shelf LLMs [29].…”
Section: Foundation Models and Artificial General Intelligence (Agi)mentioning
confidence: 99%
“…The typical transformer neural network architecture (see Box 1 for a glossary of key terminology) creates meaningful embeddings using the attention mechanism. This architecture consists of encoders, which process input data into a context vector, and decoders, which translate the context vector into the desired output 2–8 . Decoder‐only LLMs, such as OpenAI's ChatGPT, are autoregressive models, indicating that the text‐generation process predicts the next word using all preceding words, and the final outcome of the model is in a form that humans can readily recognize 9,10 .…”
Section: Introductionmentioning
confidence: 99%