Stephen Roller scite author profile

Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we highlight other ingredients. Good conversation requires blended skills: providing engaging talking points, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available. Human evaluations show our best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.

show abstract

What makes a good conversation? How controllable attributes affect human judgments

See¹,

Roller²,

Kiela³

et al. 2019

186

276

View full text Add to dashboard Cite

A good conversation requires balance -between simplicity and detail; staying on topic and changing it; asking questions and answering them. Although dialogue agents are commonly evaluated via human judgments of overall quality, the relationship between quality and these individual factors is less well-studied. In this work, we examine two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chitchat dialogue: repetition, specificity, response-relatedness and question-asking. We conduct a large-scale human evaluation to measure the effect of these control parameters on multi-turn interactive conversations on the PersonaChat task. We provide a detailed analysis of their relationship to high-level aspects of conversation, and show that by controlling combinations of these variables our models obtain clear improvements in human quality judgments.

show abstract

Recipes for building an open-domain chatbot

Roller¹,

Dinan²,

Goyal³

et al. 2020

Preprint

202

View full text Add to dashboard Cite

OPT: Open Pre-trained Transformer Language Models

Zhang¹,

Roller²,

Goyal³

et al. 2022

Preprint

108

164

View full text Add to dashboard Cite

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero-and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, 1 while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models. * Equal contribution. † Work done while at Meta AI. 1 Following Brown et al. (2020), we use GPT-3 to refer to both the 175B model and the smaller scale models as well.2 Exceptions include work by EleutherAI, who released dense models up to 20B in size (Black et al., 2022), Salesforce (Nijkamp et al., 2022), and Meta AI, who released dense models up to 13B and sparse models up to 1. 1T (Artetxe et al., 2021). There is also ongoing work from the BigScience workshop (https://bigscience. huggingface.co/), which aims to open source very large multilingual language models and datasets.

show abstract

Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora

Roller¹,

Kiela²,

Nickel³

2018

105

View full text Add to dashboard Cite

Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods. In this paper, we study the performance of both approaches on several hypernymy tasks and find that simple pattern-based methods consistently outperform distributional methods on common benchmark datasets. Our results show that pattern-based models provide important contextual constraints which are not yet captured in distributional methods.

show abstract

Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

Li¹,

Roller²,

Kulikov³

et al. 2020

View full text Add to dashboard Cite

Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019a) to these cases. We show that appropriate loss functions which regularize generated outputs to match human distributions are effective for the first three issues. For the last important general issue, we show applying unlikelihood to collected data of what a model should not do is effective for improving logical consistency, potentially paving the way to generative models with greater reasoning ability. We demonstrate the efficacy of our approach across several dialogue tasks.

show abstract

Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment

Roller¹,

Erk²

2016

View full text Add to dashboard Cite

We consider the task of predicting lexical entailment using distributional vectors. We perform a novel qualitative analysis of one existing model which was previously shown to only measure the prototypicality of word pairs. We find that the model strongly learns to identify hypernyms using Hearst patterns, which are well known to be predictive of lexical relations. We present a novel model which exploits this behavior as a method of feature extraction in an iterative procedure similar to Principal Component Analysis. Our model combines the extracted features with the strengths of other proposed models in the literature, and matches or outperforms prior work on multiple data sets.

show abstract

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Roller

Papaxanthos

et al. 2019

View full text Add to dashboard Cite

We consider the task of inferring is-a relationships from large text corpora. For this purpose, we propose a new method combining hyperbolic embeddings and Hearst patterns. This approach allows us to set appropriate constraints for inferring concept hierarchies from distributional contexts while also being able to predict missing is-a-relationships and to correct wrong extractions. Moreover -and in contrast with other methods -the hierarchical nature of hyperbolic space allows us to learn highly efficient representations and to improve the taxonomic consistency of the inferred hierarchies. Experimentally, we show that our approach achieves state-of-the-art performance on several commonly-used benchmarks.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stephen Roller

Recipes for Building an Open-Domain Chatbot

What makes a good conversation? How controllable attributes affect human judgments

Recipes for building an open-domain chatbot

OPT: Open Pre-trained Transformer Language Models

Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora

Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Contact Info

Product

Resources

About