Author verification by linguistic profiling

Halteren, H. van

doi:10.1145/1187415.1187416

Cited by 64 publications

(10 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In one implementation of the hybrid sampling approach, all training text samples for an author are handled separately, as in the instance-based approach. All samples from each author are then combined as an average of the feature vectors to produce a single profile vector [39,40]. In the implementation of another hybrid sampling approach, the reverse process of the previous order was applied; the profile sample is first produced by combining all of the training samples for each author and is then divided to obtain segments of the same size [4,41].…”

Section: The Proposed Bbm-assisted 2d-av Systemmentioning

confidence: 99%

Binary background model with geometric mean for author-independent authorship verification

Canbay

Sezer

Sever

2021

Journal of Information Science

View full text Add to dashboard Cite

Authorship verification (AV) is one of the main problems of authorship analysis and digital text forensics. The classical AV problem is to decide whether or not a particular author wrote the document in question. However, if there is one and relatively short document as the author’s known document, the verification problem becomes more difficult than the classical AV and needs a generalised solution. Regarding to decide AV of the given two unlabeled documents (2D-AV), we proposed a system that provides an author-independent solution with the help of a Binary Background Model (BBM). The BBM is a supervised model that provides an informative background to distinguish document pairs written by the same or different authors. To evaluate the document pairs in one representation, we also proposed a new, simple and efficient document combination method based on the geometric mean of the stylometric features. We tested the performance of the proposed system for both author-dependent and author-independent AV cases. In addition, we introduced a new, well-defined, manually labelled Turkish blog corpus to be used in subsequent studies about authorship analysis. Using a publicly available English blog corpus for generating the BBM, the proposed system demonstrated an accuracy of over 90% from both trained and unseen authors’ test sets. Furthermore, the proposed combination method and the system using the BBM with the English blog corpus were also evaluated with other genres, which were used in the international PAN AV competitions, and achieved promising results.

show abstract

Section: The Proposed Bbm-assisted 2d-av Systemmentioning

confidence: 99%

Binary background model with geometric mean for author-independent authorship verification

Canbay

Sezer

Sever

2021

Journal of Information Science

View full text Add to dashboard Cite

show abstract

“…In a verification problem (see above) one is given writing examples of an author A, and one is asked to verify whether or not a document d of unknown authorship in fact is written by A. Recent contributions to the authorship attribution problem include (Rudman 1997;Stamatatos 2001Stamatatos , 2007Stamatatos , 2009Chaski 2005;Juola 2006;Malyutov 2006;Sanderson and Guenter 2006b); the authorship verification problem is addressed in Koppel and Schler (2004b), van Halteren (2004van Halteren ( , 2007, Meyer zu Eissen and Stein (2006Stein ( , 2007, Koppel et al (2007), , Stein et al 2008 andPavelec et al (2008).…”

Section: Existing Researchmentioning

confidence: 99%

Intrinsic plagiarism analysis

Stein

Lipka

Prettenhofer

2010

Lang Resources & Evaluation

112

View full text Add to dashboard Cite

Research in automatic text plagiarism detection focuses on algorithms that compare suspicious documents against a collection of reference documents. Recent approaches perform well in identifying copied or modified foreign sections, but they assume a closed world where a reference collection is given. This article investigates the question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form. We call this problem class intrinsic plagiarism analysis; it is closely related to the problem of authorship verification. Our contributions are threefold. (1) We organize the algorithmic building blocks for intrinsic plagiarism analysis and authorship verification and survey the state of the art.(2) We show how the meta learning approach of Koppel and Schler, termed ''unmasking'', can be employed to post-process unreliable stylometric analysis results. (3) We operationalize and evaluate an analysis chain that combines document chunking, style model computation, one-class classification, and meta learning. Problem statementIn the following, the term plagiarism refers to text plagiarism, i.e., the use of another author's information, language, or writing, when done without proper acknowledgment of the original source. Plagiarism detection refers to the unveiling of text plagiarism. Existing approaches to computer-based plagiarism detection break down this task into manageable parts:''Given a text d and a reference collection D, does d contain a section s for which one can find a document d i [ D that contains a section s i such that under some retrieval model R the similarity u R between s and s i is above a threshold h?''Observe that research on automated plagiarism detection presumes a closed world where a reference collection D is given. Since D can be extremely largepossibly the entire indexed part of the World Wide Web-the main research focus is on efficient search technology: near-similarity search and near-duplicate detection (Brin et al

show abstract

“…As a method capable of capturing language learners' individual differences in their performance, linguistic profiling has been more frequently applied in studies related to language learning. According to Halteren (2007), the concept of profiling focuses on linguistic features, the statistical calculation of which could assist researchers in looking for information underlying the text.…”

Section: Introductionmentioning

confidence: 99%

An investigation of high-proficiency L2 English speakers' oral test performance: A profiling approach

Gao

2022

Front. Commun.

View full text Add to dashboard Cite

Linguistic profiles, which are often established through the measurement of linguistic features, are able to demonstrate characteristics shared by a specific type of text or a group of language learners. This paper examines the contexts and purposes related to profiling research in language studies, meanwhile synthesizing quantitative profiling methods such as cluster analysis, Principal Component Analysis (PCA), and Factor Analysis (FA). A profiling study of high-proficiency L2 English speakers' test performance is also presented, which explains the profiling procedure in L2 speaking assessment. Cluster analysis conducted on speech fluency and vocabulary variables rendered four different speech profiles, which are associated with the speakers' L1 background and L2 English proficiency level. This paper also discusses the interpretation of linguistic profiles, as well as the statistical concerns involved in the profile construction process.

show abstract

Author verification by linguistic profiling

Cited by 64 publications

References 4 publications

Binary background model with geometric mean for author-independent authorship verification

Binary background model with geometric mean for author-independent authorship verification

Intrinsic plagiarism analysis

An investigation of high-proficiency L2 English speakers' oral test performance: A profiling approach

Contact Info

Product

Resources

About