Shrimai Prabhumoye scite author profile

Style transfer is the task of transferring an attribute of a sentence (e.g., formality) while maintaining its semantic content. The key challenge in style transfer is to strike a balance between the competing goals, one to preserve meaning and the other to improve the style transfer accuracy. Prior research has identified that the task of meaning preservation is generally harder to attain and evaluate. This paper proposes two extensions of the state-ofthe-art style transfer models aiming at improving the meaning preservation in style transfer. Our evaluation shows that these extensions help to ground meaning better while improving the transfer accuracy.

show abstract

The Second Conversational Intelligence Challenge (ConvAI2)

Dinan

et al. 2019

View full text Add to dashboard Cite

We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).

show abstract

A Dataset for Document Grounded Conversations

Zhou¹,

Prabhumoye²,

Black³

2018

177

173

View full text Add to dashboard Cite

This paper introduces a document grounded dataset for conversations. We define "Document Grounded Conversations" as conversations that are about the contents of a specified document. In this dataset the specified documents were Wikipedia articles about popular movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation. This positions this dataset to not only provide a relevant chat history while generating responses but also provide a source of information that the models could use. We describe two neural architectures that provide benchmark performance on the task of generating the next response. We also evaluate our models for engagement and fluency, and find that the information from the document helps in generating more engaging and fluent responses.

show abstract

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Smith¹,

Patwary²,

Norick³

et al. 2022

Preprint

123

133

View full text Add to dashboard Cite

Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and finetuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models. As the result of a joint effort between Microsoft and NVIDIA, we present details on the training of the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters. In this paper, we first focus on the infrastructure as well as the 3D parallelism methodology used to train this model using DeepSpeed and Megatron. Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model. Finally, we discuss various evaluation results, as well as other interesting observations and new properties exhibited by MT-NLG. We demonstrate that MT-NLG achieves superior zero-, one-, and few-shot learning accuracies on several NLP benchmarks and establishes new state-of-the-art results. We believe that our contributions will help further the development of large-scale training infrastructures, large-scale language models, and natural language generations.

show abstract

Politeness Transfer: A Tag and Generate Approach

Madaan¹,

Setlur²,

Parekh³

et al. 2020

113

132

View full text Add to dashboard Cite

This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 million instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content. For politeness as well as five other transfer tasks, our model outperforms the state-of-the-art methods on automatic metrics for content preservation, with a comparable or better performance on style transfer accuracy. Additionally, our model surpasses existing methods on human evaluations for grammaticality, meaning preservation and transfer accuracy across all the six style transfer tasks. The data and code is located at https:// github.com/tag-and-generate/ IntroductionPoliteness plays a crucial role in social interaction, and is closely tied with power dynamics, social distance between the participants of a conversation, and gender (Brown et al., 1987;Danescu-Niculescu-Mizil et al., 2013). It is also imperative to use the appropriate level of politeness for smooth communication in conversations (Coppock, 2005), organizational settings like emails (Peterson et al., 2011), memos, official documents, and many other settings. Notably, politeness has also been identified as an interpersonal style which can be decoupled from content (Kang and Hovy, 2019). Motivated by its central importance, in this paper we study the task of converting non-polite sentences to polite sentences while preserving the meaning.Prior work on text style transfer (

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.