Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining 2005
DOI: 10.1145/1081870.1081947
|View full text |Cite
|
Sign up to set email alerts
|

Determining an author's native language by mining a text for errors

Abstract: In this paper, we show that stylistic text features can be exploited to determine an anonymous author's native language with high accuracy. Specifically, we first use automatic tools to ascertain frequencies of various stylistic idiosyncrasies in a text. These frequencies then serve as features for support vector machines that learn to classify texts according to author native language.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
122
0
1

Year Published

2008
2008
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 101 publications
(129 citation statements)
references
References 8 publications
3
122
0
1
Order By: Relevance
“…Many studies since that of Mosteller and Wallace have shown the efficacy of function words for authorship attribution in different scenarios (Morton 1978;Burrows 1987;Karlgren & Cutting 1994;Merriam & Matthews 1994;Kessler et al 1997;Argamon et al 1998;Holmes 1998;de Vel et al 2001;Holmes et al 2001aHolmes et al , 2001bBaayen et al 2002;Binongo 2003;Juola & Baayen 2003;Zhao & Zobel 2005;Argamon & Levitan 2005;Koppel et al 2005Koppel et al , 2006a, confirming the hypothesis that different authors tend to have different characteristic patterns of function word use.…”
Section: Function Wordsmentioning
confidence: 82%
See 4 more Smart Citations
“…Many studies since that of Mosteller and Wallace have shown the efficacy of function words for authorship attribution in different scenarios (Morton 1978;Burrows 1987;Karlgren & Cutting 1994;Merriam & Matthews 1994;Kessler et al 1997;Argamon et al 1998;Holmes 1998;de Vel et al 2001;Holmes et al 2001aHolmes et al , 2001bBaayen et al 2002;Binongo 2003;Juola & Baayen 2003;Zhao & Zobel 2005;Argamon & Levitan 2005;Koppel et al 2005Koppel et al , 2006a, confirming the hypothesis that different authors tend to have different characteristic patterns of function word use.…”
Section: Function Wordsmentioning
confidence: 82%
“…More recently, Graham et al (2005) and Zheng et al (2006) used neural networks on a wide variety of features. Other studies used k-nearest neighbor (Kjell et al 1995;Hoorn et al 1999;Zhao & Zobel 2005), Naive Bayes (Kjell 1994a;Hoorn et al 1999;Peng et al 2004), rule learners (Holmes & Forsyth 1995;Holmes 1998;Argamon et al 1998;Koppel & Schler 2003;Abbasi & Chen 2005;Zheng et al 2006), support vector machines (De Vel et al 2001;Diederich et al 2003;Koppel & Schler 2003, Abbasi & Chen 2005Koppel et al 2005;Zheng et al 2006), Winnow (Koppel et al 2002;Argamon et al 2003;Koppel et al 2006a), and Bayesian regression Madigan et al 2006;Argamon et al 2008). Further details regarding these studies can be found in the Appendix.…”
Section: Machine Learning Approachmentioning
confidence: 99%
See 3 more Smart Citations