A Multimodal Crowdsourcing Framework for Transcribing Historical Handwritten Documents

Romero

Computational Intelligence

2018

Self Cite

Knowledge mining from documents usually use document engineering techniques that allow the user to access the information contained in documents of interest. In this framework, transcription may provide efficient access to the contents of handwritten documents. Manual transcription is a time‐consuming task that can be sped up by using different mechanisms. A first possibility is employing state‐of‐the‐art handwritten text recognition systems to obtain an initial draft transcription that can be manually amended. A second option is employing crowdsourcing to obtain a massive but not error‐free draft transcription. In this case, when collaborators employ mobile devices, speech dictation can be used as a transcription source, and speech and handwritten text recognition can be fused to provide a better draft transcription, which can be amended with even less effort. A final option is using interactive assistive frameworks, where the automatic system that provides the draft transcription and the transcriber cooperate to generate the final transcription. The novel contributions presented in this work include the study of the data fusion on a multimodal crowdsourcing framework and its integration with an interactive system. The use of the proposed solutions reduces the required transcription effort and optimizes the overall performance and usability, allowing for a better transcription process.

show abstract

Section: Experimental Conditionsmentioning

confidence: 49%

Section: Related Workmentioning

confidence: 99%

Section: Catti Formal Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Multimodality, interactivity, and crowdsourcing for document transcription

Romero

Computational Intelligence

2018

Self Cite

show abstract

“…In a previous work [9] we observed that this crowdsourcing framework presents the highest reliability (for this corpus) when the multimodal combination is a bit balanced to the speech output (α = 0.6, with Θ = 10 −4 ), and the language model interpolation to the original model (λ = 0.4). We also noted that the speaker ordering and the reliability verification did not show a significant impact on the results.…”

Section: A Baseline and Framework Adjustmentmentioning

confidence: 73%

“…Works such as that of [4] reveal the feasibility of the acquisition of speech corpora by using mobile devices and the capacity of the crowdsourcing framework to obtain annotated speech corpora at several levels. In [9], a first step on the incorporation of multimodality in crowdsourcing is shown, by presenting a framework where the acquired modality (speech) is not the one to be transcribed (handwritten text).…”

Section: Related Workmentioning

confidence: 99%

Multimodal Crowdsourcing for Transcribing Handwritten Documents

IEEE/ACM Trans. Audio Speech Lang. Process.

2017

Self Cite

Abstract-Transcription of handwritten documents is an important research topic for multiple applications, such as document classification or information extraction. In the case of historical documents, their transcription allows to preserve cultural heritage because of the amount of historical data contained in those documents. The transcription process can employ state-ofthe-art handwritten text recognition systems in order to obtain an initial transcription. This transcription is usually not good enough for the quality standards, but that may speed up the final transcription of the expert. In this framework, the use of collaborative transcription applications (crowdsourcing) has risen in the recent years, but these platforms are mainly limited by the use of non-mobile devices. Thus, the recruiting initiatives get reduced to a smaller set of potential volunteers. In this work, an alternative that allows the use of mobile devices is presented. The proposal consists of using speech dictation of handwritten text lines. Then, by using multimodal combination of speech and handwritten text images, a draft transcription can be obtained, presenting more quality than that obtained by only using handwritten text recognition. The speech dictation platform is implemented as a mobile device application, which allows for a wider range of population for recruiting volunteers. A real acquisition on the contents of a Spanish historical handwritten book was obtained with the platform. This data was used to perform experiments on the behaviour of the proposed framework. Some experiments were performed to study how to optimise the collaborators effort in terms of number of collaborations, including how many lines and which lines should be selected for the speech dictation.

show abstract

Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts

Advances in Speech and Language Technologies for Iberian Languages

2016