Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) 2021
DOI: 10.18653/v1/2021.gem-1.2
|View full text |Cite
|
Sign up to set email alerts
|

Human Perception in Natural Language Generation

Abstract: We take a collection of short texts, some of which are human-written, while others are automatically generated, and ask subjects, who are unaware of the texts' source, whether they perceive them as human-produced. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, and observe that the production of this fine-tuned model is indeed perceived as more humanlike than that of the original model. Contextually, we show that our automatic evaluation strategy correlates well with h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 12 publications
(8 reference statements)
1
3
0
Order By: Relevance
“…This highlights the difficulty for humans in assessing the style strength, separating it from the structure and semantics. These findings are in line with recent studies in the field (De Mattei, Cafagna, Dell'Orletta, & Nissim, 2020).…”
Section: Comparing Automatic and Human Evaluationssupporting
confidence: 93%
“…This highlights the difficulty for humans in assessing the style strength, separating it from the structure and semantics. These findings are in line with recent studies in the field (De Mattei, Cafagna, Dell'Orletta, & Nissim, 2020).…”
Section: Comparing Automatic and Human Evaluationssupporting
confidence: 93%
“…De Mattei et al (2020) put forward the idea that news styles are more difficult to judge than others (e.g., sentiment), and that humans are not as reliable judges of said styles as machines. They proposed a framework for the automatic evaluation of style-aware generation that seems handy for style transfer as well.…”
Section: Discussionmentioning
confidence: 99%
“…A useful newspaper dataset for style transfer was created by De Mattei et al (2020), even though their work regarded style-aware generation rather than transfer. They collected news that are lexically similar from two newspapers, a subset of which are topic-aligned.…”
Section: Intended Stylesmentioning
confidence: 99%
“…Reference-free evaluation. A popular, referencefree alternative is to train evaluation models that discriminate human from model output (e.g., Bruni and Fernández, 2017;Gehrmann et al, 2019;Hashimoto et al, 2019), score the appropriateness of input-output pairs (e.g., Sinha et al, 2020;Fomicheva et al, 2020), or model human judgements directly (e.g., Lowe et al, 2017;De Mattei et al, 2021;Rei et al, 2021). Neural language models themselves have been proposed as evaluators (e.g., Yuan et al, 2021;Deng et al, 2021) and used to assess generations along interpretable evaluation dimensions (Zhong et al, 2022), yet they have been criticised for being biased (toward models similar to the evaluator) and thus limited in their ability to evaluate generated text (Deutsch et al, 2022).…”
Section: Related Workmentioning
confidence: 99%