Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.673
|View full text |Cite
|
Sign up to set email alerts
|

Authorship Attribution for Neural Text Generation

Abstract: In recent years, the task of generating realistic short and long texts have made tremendous advancements. In particular, several recently proposed neural network-based language models have demonstrated their astonishing capabilities to generate texts that are challenging to distinguish from human-written texts with the naked eye. Despite many benefits and utilities of such neural methods, in some applications, being able to tell the "author" of a text in question becomes critically important. In this work, in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
67
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 56 publications
(70 citation statements)
references
References 30 publications
(24 reference statements)
3
67
0
Order By: Relevance
“…They show that word order does not matter much as a bag-of-words detector performs very similar to detectors based on complex encoder (e.g., transformer). This result is consistent with the recent work done by Uchendu et al, (2020), which shows that simple models (traditional ML models trained on psychological features and simple neural network architectures) perform well in three settings: (i) classify if two given articles are generated by the same TGM; (ii) classify if a given article is written by a human or a TGM (the original detection problem); (iii) identify the TGM that generated a given article (similar to Tay et al, (2020)). For the original detection problem, the authors find that the text generated by the GPT-2 model to be hard to detect among several TGMs (see Appendix for the list of studied TGMs).…”
Section: Classifiers Trained From Scratchsupporting
confidence: 94%
See 4 more Smart Citations
“…They show that word order does not matter much as a bag-of-words detector performs very similar to detectors based on complex encoder (e.g., transformer). This result is consistent with the recent work done by Uchendu et al, (2020), which shows that simple models (traditional ML models trained on psychological features and simple neural network architectures) perform well in three settings: (i) classify if two given articles are generated by the same TGM; (ii) classify if a given article is written by a human or a TGM (the original detection problem); (iii) identify the TGM that generated a given article (similar to Tay et al, (2020)). For the original detection problem, the authors find that the text generated by the GPT-2 model to be hard to detect among several TGMs (see Appendix for the list of studied TGMs).…”
Section: Classifiers Trained From Scratchsupporting
confidence: 94%
“…TGMs can also be used to generate text that approximately matches the style of human language, which benefits applications such as story generation (Fan et al, 2018), conversational response generation (Zhang et al, 2020), code auto-completion (TabNine, 2020), and radiology report generation (Liu et al, 2019a). Malicious usage: TGMs can have unfortunate uses by (even low-skilled) adversaries for malicious purposes, such as fake news generation (Zellers et al, 2019;Brown et al, 2020;Uchendu et al, 2020), fake product reviews generation (Adelani et al, 2020), and spamming/phishing (Weiss, 2019). Humans can spot fake news articles (Brown et al, 2020), fake product reviews (Adelani et al, 2020), and fake comments (Weiss, 2019) generated by TGM only at chance level.…”
Section: Social Impacts Of Tgmsmentioning
confidence: 99%
See 3 more Smart Citations