2018
DOI: 10.1007/s11192-018-2661-6
|View full text |Cite
|
Sign up to set email alerts
|

Authorship identification of documents with high content similarity

Abstract: The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e. authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of author… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0
2

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(22 citation statements)
references
References 14 publications
(12 reference statements)
0
16
0
2
Order By: Relevance
“…The assumption is that documents or texts clustered together are more likely to be written by the same author. Rexha, Kröll, Ziak, and Kern (2018) explain that authorship recognition can be done using document clustering where the author of a disputed or controversial text can be identified from a set of candidate authors. Theodoridis and Koutroubas (2003) suggest that text clustering is one of the most primitive mental activities of humans.…”
Section: Methodology Methodsmentioning
confidence: 99%
“…The assumption is that documents or texts clustered together are more likely to be written by the same author. Rexha, Kröll, Ziak, and Kern (2018) explain that authorship recognition can be done using document clustering where the author of a disputed or controversial text can be identified from a set of candidate authors. Theodoridis and Koutroubas (2003) suggest that text clustering is one of the most primitive mental activities of humans.…”
Section: Methodology Methodsmentioning
confidence: 99%
“…The text analysis is field with different topic as the linguistic [ [206] , [207] , [208] ], the stylometry [ 209 ], and text classification [ 210 ].…”
Section: Miscellaneousmentioning
confidence: 99%
“…Seiring dengan proses otomatisasi di segala bidang, maka makna stilometri mengalami pergeseran dan difenisikan oleh Halvani [1] sebagai "cabang ilmu yang menentukan kepemilikan pengarang terhadap karya-karya tulis melalui analisis statistik dan pembelajaran mesin". Analisis stilometri banyak diterapkan dalam aplikasi komputasional yang lebih kompleks seperti pada Identifikasi Kepengarangan (authorship identification) oleh Rexha dkk [2], atribusi dan diarisasi penulis oleh Stamatatos dkk [3], atau Deteksi Plagiasi Intrinsik (DPI) oleh Rexha [1] dan Kuznetsov dkk [4].…”
Section: Pendahuluanunclassified
“…Selain itu, sebagian besar sistem DPI menggunakan beberapa fitur sekaligus daripada hanya mengandalkan fitur tunggal. Fitur stilometri lainnya yang kerap digunakan adalah frekuensi panjang kata [2], [12], Frekuensi panjang kalimat [12], [13], Frekuensi tag kelas kata (part of speech (pos) tag frequency) [12], Rasio type-token [2], [13], dan frekuensi kespesifikan kata [12].…”
Section: Pendahuluanunclassified