2006
DOI: 10.1007/11892755_87
|View full text |Cite
|
Sign up to set email alerts
|

Authorship Attribution Using Word Sequences

Abstract: Abstract. Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures the writing style of authors. This paper proposes a new method for authorship attribution supported on the idea that a proper identification of authors must consider both stylistic and topic features of texts. This method characterizes documents by a set of word sequences that combine functional and content words. The exper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 61 publications
(27 citation statements)
references
References 6 publications
0
27
0
Order By: Relevance
“…To take advantage of contextual information, word n-grams (n contiguous words aka word collocations) have been proposed as textual features (Peng, et al, 2004;Sanderson & Guenther, 2006;CoyotlMorales, Villaseñor-Pineda, Montes-y-Gómez, & Rosso, 2006). However, the classification accuracy achieved by word n-grams is not always better than individual word features (Sanderson & Guenther, 2006;Coyotl-Morales, et al, 2006). The dimensionality of the problem following this approach increases considerably with n to account for all the possible combinations between words.…”
Section: Lexical Featuresmentioning
confidence: 99%
“…To take advantage of contextual information, word n-grams (n contiguous words aka word collocations) have been proposed as textual features (Peng, et al, 2004;Sanderson & Guenther, 2006;CoyotlMorales, Villaseñor-Pineda, Montes-y-Gómez, & Rosso, 2006). However, the classification accuracy achieved by word n-grams is not always better than individual word features (Sanderson & Guenther, 2006;Coyotl-Morales, et al, 2006). The dimensionality of the problem following this approach increases considerably with n to account for all the possible combinations between words.…”
Section: Lexical Featuresmentioning
confidence: 99%
“…Word n-grams can represent local structure of texts and document topic (Coyotl-Morales et al, 2006;Wang and Manning, 2012). On the other hand, character n-grams have been shown to be effective for capturing stylistic and morphological information (Koppel et al, 2011;Sapkota et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…We use the corpus of contemporary Mexican poets used in [19]. This corpus was gathered from the Web.…”
Section: A Experimental Setupmentioning
confidence: 99%