Universum Inference and Corpus Homogeneity

Vogel, Carl; Lynch, Gerard; Janssen, Jerom F.

doi:10.1007/978-1-84882-171-2_29

Cited by 3 publications

(5 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various methods have been proposed in the literature to address author verification. In particular ensemble methods have proved quite sucessful at tackling the challenge of capturing the features relevant to the author's style and discarding the ones which are not [7][8][9][10]21]. Author verification has also been the focus of several iterations of the PAN shared tasks, 4 e.g.…”

Section: Related Workmentioning

confidence: 99%

“…A common option is to use the results of some Google queries formed by randomly picking words from the set of input documents as impostors. 8 In the experiments presented below (see section 5), we opt for using all the training documents as impostors. While this option is not ideal since the documents obviously include precisely the documents to be compared, it is a reasonable simplification if the training set is diverse enough in terms of authors and if the number of iterations is large enough to prevent the occasional wrong comparison from having a significant effect on the output features.…”

Section: General Impostor Strategymentioning

confidence: 99%

“…This strategy follows the idea described in [8]: in this paper, a large corpus containing several "categories" (the input documents in our task) is split into small chunks. A chunk of category A is compared against many chunks from other categories and from category A as well, picked randomly.…”

Section: Universum Inference Strategymentioning

confidence: 99%

“…3. The three categories are compared in the same way as in [8], that is, both against itself (using the two parts belonging to this category) and against each other category (picking one of the two parts randomly).…”

Section: Universum Inference Strategymentioning

confidence: 99%

See 3 more Smart Citations

CLG Authorship Analytics: a library for authorship verification

Moreau

Vogel

2022

Int J Digit Humanities

View full text Add to dashboard Cite

The task of authorship verification consists in detecting whether two texts have been written by the same person. This paper describes the CLG Authorship Analytics software, which implements several individual methods as well as a stacked generalization system for authorship verification. The approach relies primarily on ensemble learning methods, i.e. repeatedly sampling the data in order to capture the invariant stylistic patterns. The approach is tested through a series of experiments designed to test the ability of the system to generalize, depending on various parameters. The code and results of the experiments are publicly available.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: General Impostor Strategymentioning

confidence: 99%

Section: Universum Inference Strategymentioning

confidence: 99%

Section: Universum Inference Strategymentioning

confidence: 99%

See 2 more Smart Citations

CLG Authorship Analytics: a library for authorship verification

Moreau

Vogel

2022

Int J Digit Humanities

View full text Add to dashboard Cite

show abstract

“…It is convenient to conflate the notions of "methods" and "tools". Various aspects of the tools and analysis conducted using the tools have been published (Appel & Vogel, 2001;Van Gijsel & Vogel, 2003;O'Brien & Vogel, 2003;Vogel, 2007b;Healey, Vogel, & Eshghi, 2007;Vogel, 2007a;Vogel & Brisset, 2007;Frontini, Lynch, & Vogel, 2008;Vogel, Lynch, & Janssen, 2008). Student projects (e.g.…”

Section: Introductionmentioning

confidence: 99%

Found in translation

Vogel¹,

Lynch²,

Moreau³

et al. 2013

Self Cite

View full text Add to dashboard Cite

We describe translation effects that have been studied in the the automated text classification literature. We expand on a point within this research space, quality effects, with our own work in this area. We present an efficient method for evaluating text quality on the basis of reference texts. The method, which is general to text classification problems more widely construed, is related to the background literature and argued to be effective on the strength of the fact that it enables quality checking of amounts of text that exceed what is humanly feasible to verify. The method partially automates the process: in processing the entirety of a translated corpus being probed, it ranks items for stylistic conformity with a reference corpus, and the least conforming ranks are indicated as the items most likely to require human intervention.

show abstract

Generative Adversarial Networks in Federated Learning

Miao

Vogel

2023

Smart Innovation, Systems and Technologies

View full text Add to dashboard Cite

Universum Inference and Corpus Homogeneity

Cited by 3 publications

References 5 publications

CLG Authorship Analytics: a library for authorship verification

CLG Authorship Analytics: a library for authorship verification

Found in translation

Generative Adversarial Networks in Federated Learning

Contact Info

Product

Resources

About