2013
DOI: 10.4000/jtei.739
|View full text |Cite
|
Sign up to set email alerts
|

Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text

Abstract: Among mass digitization methods, double-keying is considered to be the one with the lowest error rate. This method requires two independent transcriptions of a text by two different operators. It is particularly well suited to historical texts, which often exhibit deficiencies like poor master copies or other difficulties such as spelling variation or complex text structures.Providers of data entry services using the double-keying method generally advertise very high accuracy rates (around 99.95% to 99.98%). T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 4 publications
0
7
0
Order By: Relevance
“…The two resulting versions are compared in order to detect transcription errors. The premise of double-keying is that two human operators are unlikely to make the same mistakes (Haaf et al , 2013). However, because, as we have already established, Chinese characters are so diverse and variable, two independent operators in a double-keying process may recognize the same form as different characters and thus may produce two results, which may both be right or both be wrong.…”
Section: The Task Design Of Full-text Generation Of Mass Chas and Two Principles Of Task Assignmentmentioning
confidence: 99%
“…The two resulting versions are compared in order to detect transcription errors. The premise of double-keying is that two human operators are unlikely to make the same mistakes (Haaf et al , 2013). However, because, as we have already established, Chinese characters are so diverse and variable, two independent operators in a double-keying process may recognize the same form as different characters and thus may produce two results, which may both be right or both be wrong.…”
Section: The Task Design Of Full-text Generation Of Mass Chas and Two Principles Of Task Assignmentmentioning
confidence: 99%
“…According to Doan et al () VotoSocial's architecture is explicit; that is, users explicitly collaborate with the platform, as opposed to implicit collaboration where users collaborate as a side effect of their actions in a system. Furthermore, VotoSocial assigns the users with task execution, all work is undertaken online, and unlike other systems that send tickets when a user finds a mis‐transcription (Haaf, Wiegand, & Geyken, ), the correction is done online and almost in real time. Still, users may send reports when they find suspicious records such as the one seen in Figure .…”
Section: The Launch Of Votosocialmentioning
confidence: 99%
“…A triple verification process was used, allowing for user‐controlled validation of the data. This is similar to a double‐keying system (Haaf et al, ), but it was decided that triple validation would be more user friendly (i.e., requiring only a single click to register a transcription as accurate or not; as opposed to double‐keying systems where all users need to input the entire text that is being digitized) without compromising accuracy, and would also enable the community to automatically validate the digitization process. (See Figure , bottom left, showing the two buttons to register the transcription as either correct or incorrect.)…”
Section: The Launch Of Votosocialmentioning
confidence: 99%
“…Alternatively, with substantial effort, digitization can be performed manually, or by manual correction of OCR output (Tanner et al, 2009). However, even for manually "keyed-in" corpora, noise can be introduced due to errors in workflow (Haaf et al, 2013).…”
Section: Introductionmentioning
confidence: 99%