2012
DOI: 10.1080/0013838x.2012.668793
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Genre Authorship Verification Using Unmasking

Abstract: In this paper we will stress-test a recently proposed technique for computational authorship verification, ''unmasking'', which has been well received in the literature. The technique envisages an experimental setup commonly referred to as ''authorship verification'', a task generally deemed more difficult than so-called ''authorship attribution''. We will apply the technique to authorship verification across genres, an extremely complex text categorization problem that so far has remained unexplored. We focus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
38
0
1

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 46 publications
(40 citation statements)
references
References 15 publications
1
38
0
1
Order By: Relevance
“…Based on another small corpus (2 authors and 3 topics) Madigan, et al (2005) demonstrated that POS features are more effective than word unigrams in crosstopic conditions. The unmasking method for author verification of long documents based on very frequent word frequencies was successfully tested in cross-topic conditions (Koppel et al, 2007) but Kestemont, et al (2012) found that its reliability was significantly lower in cross-genre conditions. Function words have been found to be effective when topics of the test corpus are excluded from the training corpus (Baayen et al, 2002;Goldstein-Stewart et al, 2009;Menon and Choi, 2011).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on another small corpus (2 authors and 3 topics) Madigan, et al (2005) demonstrated that POS features are more effective than word unigrams in crosstopic conditions. The unmasking method for author verification of long documents based on very frequent word frequencies was successfully tested in cross-topic conditions (Koppel et al, 2007) but Kestemont, et al (2012) found that its reliability was significantly lower in cross-genre conditions. Function words have been found to be effective when topics of the test corpus are excluded from the training corpus (Baayen et al, 2002;Goldstein-Stewart et al, 2009;Menon and Choi, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…In most applications, there are certain restrictions that do not allow the construction of a representative training corpus. Unlike other text categorization tasks, a recent trend in authorship attribution research is to build cross-genre and crosstopic models, meaning that the training and test corpora do not share the same properties (Kestemont et al, 2012;Stamatatos, 2013;Sapkota et al, 2014;Stamatatos et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…We tested whether the approach works for the cross-genre authorship verification task in the expectation that the genre markers would be limited and superficial and would therefore be among the first to be discarded in the unmasking approach, leading to a clear degradation curve indicative of same authorship. We refer to the paper [23] for a detailed description of the operationalization of the unmasking approach to our crossgenre case. We applied the approach to theatre and prose texts of five authors.…”
Section: Cross-genre Stylometrymentioning
confidence: 99%
“…In a recent study [23] we tackled both the problem of verification (rather than attribution, i.e. the open case) and the problem of cross-genre generalization.…”
Section: Cross-genre Stylometrymentioning
confidence: 99%
“…In authorship studies, there is nowadays a general consensus that features related to style are more useful (Juola, 2006;Koppel et al, 2009;Stamatatos, 2009b), since topical, content-related features vary much more strongly across the documents 40 authored by a single individual. Much research nowadays therefore concerns ways to effectively extract stylistic characteristics from documents that are not affected by a text's specific content or genre (Argamon & Levitan, 2005;Kestemont et al, 2012;Efstathios, 2013;Sapkota et al, 2015;Seroussi et al, 2014;Sapkota et al, 2014). This has not always been the case: historical practitioners in earlier centuries, commonly based attributions on a much looser defined set of linguistic criteria, including, for instance, 45 the use of conspicuous, rare words (Love, 2002;Kestemont, 2014).…”
mentioning
confidence: 99%