Evaluating the Evaluation of Diversity in Natural Language Generation

Tevet, Guy; Berant, Jonathan

doi:10.48550/arxiv.2004.02990

Cited by 8 publications

(9 citation statements)

References 20 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We conclude that the low-diversity problem is mainly manifested in two aspects: form and content (Tevet & Berant, 2020;Fu et al, 2020;Holtzman et al, 2020). As shown Table 1, the low form diversity can be reflected in repeating some words, using similar lexicon and syntax, and more.…”

Section: Introductionmentioning

confidence: 70%

Diversifying Neural Text Generation with Part-of-Speech Guided Softmax and Sampling

Yang¹,

Xu²,

Wan³

2021

Preprint

View full text Add to dashboard Cite

Neural text generation models are likely to suffer from the low-diversity problem. Various decoding strategies and training-based methods have been proposed to promote diversity only by exploiting contextual features, but rarely do they consider incorporating syntactic structure clues. In this work, we propose using linguistic annotation, i.e., part-of-speech (POS), to guide the text generation. In detail, we introduce POS Guided Softmax (POSG-Softmax) to explicitly model two posterior probabilities: (i) next-POS, and (ii) nexttoken from the vocabulary of the target POS. A POS guided sampling strategy is further proposed to address the low-diversity problem by enriching the diversity of POS. Extensive experiments and human evaluations demonstrate that, compared with existing state-of-the-art methods, our proposed methods can generate more diverse text while maintaining comparable quality.

show abstract

Section: Introductionmentioning

confidence: 70%

Diversifying Neural Text Generation with Part-of-Speech Guided Softmax and Sampling

Yang¹,

Xu²,

Wan³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It could be argued that a desirable captioning system is one that generates both plausible and diverse text. However, it has been documented that the diversity of generated text can be at odds with the performance of captioning systems (Dušek et al, 2020;Tevet & Berant, 2020). In previous sections, we made the case for plausibility.…”

Section: Quantifying Diversity Of Generated Multilingual Reportsmentioning

confidence: 99%

Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of Cardiac Signals

Kiyasseh,

Zhu,

Clifton

2021

Preprint

View full text Add to dashboard Cite

Cardiac signals, such as the electrocardiogram, convey a significant amount of information about the health status of a patient which is typically summarized by a clinician in the form of a clinical report, a cumbersome process that is prone to errors. To streamline this routine process, we propose a deep neural network capable of captioning cardiac signals; it receives a cardiac signal as input and generates a clinical report as output. We extend this further to generate multilingual reports. To that end, we create and make publicly available a multilingual clinical report dataset. In the absence of sufficient labelled data, deep neural networks can benefit from a 'warmstart', or pre-training, procedure in which parameters are first learned in an arbitrary task. We propose such a task in the form of discriminative multilingual pre-training where tokens from clinical reports are randomly replaced with those from other languages and the network is tasked with predicting the language of all tokens. We show that our method performs on par with state-of-the-art pre-training methods such as MLM, ELEC-TRA, and MARGE, while simultaneously generating diverse and plausible clinical reports. We also demonstrate that multilingual models can outperform their monolingual counterparts, informally terming this beneficial phenomenon as the 'blessing of multilinguality'.

show abstract

“…A single annotator completes each step to minimize cognitive load; rather than read and characterise a partial set of existing responses, an annotator must only reason about the set of responses they will write. Annotators are free to mix both surface and semantic diversity (Tevet and Berant 2020). We perform manual quality control by checking a sample of work from each annotator and conversation tree during each round of annotation.…”

Section: Conversation Turnsmentioning

confidence: 99%

MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations

Dou

Forbes

Holtzman³

et al. 2021

AAAI

View full text Add to dashboard Cite

We study conversational dialog in which there are many possible responses to a given history. We present the MultiTalk Dataset, a corpus of over 320,000 sentences of written conversational dialog that balances a high branching factor (10) with several conversation turns (6) through selective branch continuation. We make multiple contributions to study dialog generation in the highly branching setting. In order to evaluate a diverse set of generations, we propose a simple scoring algorithm, based on bipartite graph matching, to optimally incorporate a set of diverse references. We study multiple language generation tasks at different levels of predictive conversation depth, using textual attributes induced automatically from pretrained classifiers. Our culminating task is a challenging theory of mind problem, a controllable generation task which requires reasoning about the expected reaction of the listener.

show abstract

Evaluating the Evaluation of Diversity in Natural Language Generation

Cited by 8 publications

References 20 publications

Diversifying Neural Text Generation with Part-of-Speech Guided Softmax and Sampling

Diversifying Neural Text Generation with Part-of-Speech Guided Softmax and Sampling

Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of Cardiac Signals

MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations

Contact Info

Product

Resources

About