Language Models that Seek for Knowledge: Modular Search &amp; Generation for Dialogue and Prompt Completion

Shuster, Kurt; Komeili, Mojtaba; Adolphs, Leonard; Roller, Stephen; Szlam, Arthur; Weston, Jason

doi:10.48550/arxiv.2203.13224

Cited by 8 publications

(12 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While proposed algorithms work for any black-box optimization problems, they are most suited for high-dimensional optimization problems where the evaluation of a single updated parameter set is quick, but the sheer number of dimensions makes the search for a valid update excessively slow for a single worker, with a notable application being the neuroevolution of large ANNs. We foresee that such a setting is of particular relevance to multipurpose super neural networks, such as PathNet [16], as well as for conversational agents derived from LLMs [20,51]. 6 As we mention in the introduction, we use Genetic Algorithm only to designate EAs that has "chromosome" and "cross-over" phases, consistently with the nomenclature introduced in [24] Finally, more accessible and reliable byzantine-resilient machine learning allows a variety of entities to poll together their computational resources to train models they could not have trained individually, which has the potential of democratizing state-of-theart ML research in a non-differentiable setting.…”

Section: Discussionmentioning

confidence: 99%

“…There are theoretical reasons why approaches that reduce learning in a non-differentiable setting to a differentiable one would underperform compared to gradient-free black-box optimization approaches [39,46]. This is particularly relevant now, given that the latest development in the LLM field is conversation agents, which rely on optimization based on discrete feedback to align their behavior on user expectations and non-differentiable layers of hard attention to solve the issues with rule-following that plague them [20,43,51].…”

Section: Gradient-free Learningmentioning

confidence: 99%

“…This makes it interesting even in the setting allowing for gradientbased learning because, unlike differentiable layers, deterministic rules can provide deterministic guarantees on AI model decisions, which is critical in high-stakes applications. In particular, for LLMs, it has the potential of solving the long-term instruction retaining problem, currently limiting their application [20,51].…”

Section: Evolutionary Searchmentioning

confidence: 99%

See 2 more Smart Citations

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Kucharavy

Monti

Guerraoui

et al. 2023

Proceedings of the Companion Conference on Genetic and Evolutionary Computation

View full text Add to dashboard Cite

Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them.Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly pressing concern. While distributed byzantine resilient algorithms have been proposed in a differentiable setting, none exist in a gradient-free setting.The goal of this work is to address this shortcoming. For that, we introduce a more general definition of byzantine-resilience in ML -the model-consensus, that extends the definition of the classical distributed consensus. We then leverage this definition to show that a general class of gradient-free ML algorithms -(1, 𝜆)-Evolutionary Search -can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms. We provide proofs and pseudo-code for two specific cases -the Total Order Broadcast and proof-of-work leader election.To our knowledge, this is the first time a byzantine resilience in gradient-free ML was defined, and algorithms to achieve it -were proposed. CCS CONCEPTS• Computing methodologies → Genetic programming; Machine learning algorithms; Machine learning; • Theory of computation → Distributed algorithms.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Gradient-free Learningmentioning

confidence: 99%

Section: Evolutionary Searchmentioning

confidence: 99%

See 1 more Smart Citation

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Kucharavy

Monti

Guerraoui

et al. 2023

Proceedings of the Companion Conference on Genetic and Evolutionary Computation

View full text Add to dashboard Cite

show abstract

“…To this end, knowledgeaware works ground dialogue response on external knowledge (Yu et al, 2020). The knowledge can be gathered from text-based encyclopedia (Lin et al, 2020a;Zhao et al, 2020), commonsense knowledge , search engines (Shuster et al, 2022), and many others (Moghe et al, 2020;. Although such works have achieved promising results, they often only focus on high-resource English or Chinese (Kann et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

Exploring the Effectiveness of Multi-Lingual Commonsense Knowledge-Aware Open-Domain Dialogue Response Generation

Wu,

Yu,

Che

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Prior works have shown the promising results of commonsense knowledge-aware models in improving informativeness while reducing the hallucination issue. Nonetheless, prior works often can only use monolingual knowledge whose language is consistent with the dialogue context. Except for a few high-resource languages, such as English and Chinese, most languages suffer from insufficient knowledge issues, especially minority languages. To this end, this work proposes a new task, Multi-Lingual Commonsense Knowledge-Aware Response Generation (MCKRG), which tries to use commonsense knowledge in other languages to enhance the current dialogue generation. Then, we construct a MCKRG dataset MCK-Dialog 1 of seven languages with multiple alignment methods. Finally, we verify the effectiveness of using multi-lingual commonsense knowledge with a proposed MCK-T5 model. Extensive experimental results demonstrate the great potential of using multi-lingual commonsense knowledge in high-resource and lowresource languages. To the best of our knowledge, this work is the first to explore Multi-Lingual Commonsense Knowledge-Aware Response Generation .

show abstract

“…We took R2C2 (22) as our base model-a 2.7 billion-parameter Transformer-based (23) encoder-decoder model pretrained on text from the internet by using a BART denoising objective (24). The base pretrained model was then further trained on WebDiplomacy (Methods, Data) through standard maximum likelihood estimation.…”

Section: Imitation Dialogue Modelmentioning

confidence: 99%

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Bakhtin¹,

Brown²,

Dinan³

et al. 2022

Science

View full text Add to dashboard Cite

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy , a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

show abstract

Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

Cited by 8 publications

References 0 publications

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Exploring the Effectiveness of Multi-Lingual Commonsense Knowledge-Aware Open-Domain Dialogue Response Generation

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Contact Info

Product

Resources

About