2013
DOI: 10.1016/j.ipm.2012.06.003
|View full text |Cite
|
Sign up to set email alerts
|

Authorship attribution based on a probabilistic topic model

Abstract: a b s t r a c tThis paper describes, evaluates and compares the use of Latent Dirichlet allocation (LDA) as an approach to authorship attribution. Based on this generative probabilistic topic model, we can model each document as a mixture of topic distributions with each topic specifying a distribution over words. Based on author profiles (aggregation of all texts written by the same writer) we suggest computing the distance with a disputed text to determine its possible writer. This distance is based on the d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(27 citation statements)
references
References 22 publications
0
27
0
Order By: Relevance
“…Representative solutions are Burger et al [2011], Nirkhi and Dharaskar [2014], and Cavalcante et al [2014]. Exceptions are few recent applications of the topic models that actually combine these two process into one [Pratanwanich and Liò 2014;Savoy 2013a;Seroussi et al 2014]. Still, the two-processes-based studies on authorship analysis problems dominate [Rangel et al 2014].…”
Section: Introductionmentioning
confidence: 99%
“…Representative solutions are Burger et al [2011], Nirkhi and Dharaskar [2014], and Cavalcante et al [2014]. Exceptions are few recent applications of the topic models that actually combine these two process into one [Pratanwanich and Liò 2014;Savoy 2013a;Seroussi et al 2014]. Still, the two-processes-based studies on authorship analysis problems dominate [Rangel et al 2014].…”
Section: Introductionmentioning
confidence: 99%
“…The majority of published works in authorship attribution focus on closed-set attribution where it is assumed that the author of the text under investigation is necessarily a member of a given well-defined set of candidate authors (Stamatatos et al, 2000;Gamon, 2004;Escalante et al, 2011;Schwartz et al, 2013;Savoy, 2013;Seroussi et al, 2014). This setting fits many forensic applications where usually specific individuals have access to certain resources, have knowledge of certain issues, etc.…”
Section: Introductionmentioning
confidence: 99%
“…Structural features use the overall organization of the whole text, such as the length or number of sentences and paragraphs. Since lexical features are easy to extract and the result is usually unambiguous, they play the most important role in computational stylometry [17][19].…”
Section: Methodsmentioning
confidence: 99%