Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006
DOI: 10.1145/1148170.1148304
|View full text |Cite
|
Sign up to set email alerts
|

Authorship attribution with thousands of candidate authors

Abstract: In this paper, we use a blog corpus to demonstrate that we can often identify the author of an anonymous text even where there are many thousands of candidate authors. Our approach combines standard information retrieval methods with a text categorization meta-learning scheme that determines when to even venture a guess.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 67 publications
(49 citation statements)
references
References 4 publications
0
49
0
Order By: Relevance
“…Beyond literature, several evaluation corpora for authorship attribution studies have been built covering certain text domains such as online newspaper articles (Stamatatos, et al, 2000;Diederich, et al, 2003;Luyckx & Daelemans, 2005;Sanderson & Guenter, 2006), e-mail messages (de Vel, et al, 2001;Koppel & Schler, 2003), online forum messages (Argamon, et al, 2003;Abbasi & Chen, 2005;Zheng, et al, 2006), newswire stories (Khmelev & Teahan, 2003a;Zhao & Zobel, 2005), blogs (Koppel, Schler, Argamon, & Messeri, 2006), etc. Alternatively, corpora built for other purposes have also been used in the framework of authorship attribution studies including parts of the Reuters-21578 corpus (Teahan & Harper, 2003;Marton, et al, 2005), the Reuters Corpus Volume 1 (Khmelev & Teahan, 2003a;Madigan, et al, 2005;Stamatatos, 2007) and the TREC corpus (Zhao & Zobel, 2005) that were initially built for evaluating thematic text categorization tasks.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Beyond literature, several evaluation corpora for authorship attribution studies have been built covering certain text domains such as online newspaper articles (Stamatatos, et al, 2000;Diederich, et al, 2003;Luyckx & Daelemans, 2005;Sanderson & Guenter, 2006), e-mail messages (de Vel, et al, 2001;Koppel & Schler, 2003), online forum messages (Argamon, et al, 2003;Abbasi & Chen, 2005;Zheng, et al, 2006), newswire stories (Khmelev & Teahan, 2003a;Zhao & Zobel, 2005), blogs (Koppel, Schler, Argamon, & Messeri, 2006), etc. Alternatively, corpora built for other purposes have also been used in the framework of authorship attribution studies including parts of the Reuters-21578 corpus (Teahan & Harper, 2003;Marton, et al, 2005), the Reuters Corpus Volume 1 (Khmelev & Teahan, 2003a;Madigan, et al, 2005;Stamatatos, 2007) and the TREC corpus (Zhao & Zobel, 2005) that were initially built for evaluating thematic text categorization tasks.…”
Section: Discussionmentioning
confidence: 99%
“…Emphasis is now given to the objective evaluation of the proposed methods as well as the comparison of different methods based on common benchmark corpora (Juola, 2004). In addition, factors playing a crucial role in the accuracy of the produced models are examined, such as the training text size (Marton, Wu, & Hellerstein, 2005;Hirst & Feiguina, 2007), the number of candidate authors (Koppel, Schler, Argamon, & Messeri, 2006), and the distribution of training texts over the candidate authors (Stamatatos, 2008).…”
Section: Introductionmentioning
confidence: 99%
“…We use meta-learning to identify such cases and find that in the remaining cases, where the system believes attribution is reliable, we are able to provide highly accurate results. The discussion is Section 7 is an expansion of that given in Koppel et al (2006c).…”
Section: Variations On the Basic Attribution Problemmentioning
confidence: 99%
“…Experiments were conducted with support vector machine classifiers in twenty novels and success rates above 90% were obtained. The use of functional words is a valid and good approach in attribution of authorship [Koppel 2006]. A success rate of 65% and 72% has been measured in the study for authorship recognition, which is an implementation of multiple regression and discriminant analysis [Stamatatos et al, 2000].…”
Section: Related Workmentioning
confidence: 99%