An iterative multimodal framework for the transcription of handwritten historical documents

Alabau, Vicent; Martínez-Hinarejos, Carlos D.; Romero, Verónica; Lagarda, Antonio-L.

doi:10.1016/j.patrec.2012.11.007

Cited by 14 publications

(19 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The proposed approach performs OCR, calculates its confidence, and based on it takes the speech recognition result to make a combination and provide an alternative hypothesis. A similar approximation is that presented in [1], where speech or handwritten recognition results, in the form of word-graphs, are used to enhance the language model for recognising with the other modality. An alternative that does not use language model enhancement is proposed in [7] where Confusion Network combination (similar to that of [13]) is used for the combination of these two modalities.…”

Section: Related Workmentioning

confidence: 99%

“…This transcription will be provided to a paleographer to obtain the final quality transcription with the lowest effort. The framework is mainly based on two ideas: using the current system output to obtain an adapted language model that can be employed in the next decoding step [1], and combining the decoding outputs of the two modalities to obtain a final output with less errors [7].…”

Section: Crowdsourcing Frameworkmentioning

confidence: 99%

“…The language interpolation module builds a statistical language model conditioned on a sample x as follows [1]:…”

Section: A Language Model Interpolationmentioning

confidence: 99%

“…Therefore, preserving their contents is crucial for cultural and historical reasons. The interest in this preservation by using transcription led to the development of international projects such as tranScriptorium 1 or READ 2 . Quality transcriptions are usually done by experts; in the case of historical texts, because of their special features (scripting, image quality, vocabulary, ancient language, etc.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Granell

Martínez-Hinarejos

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-Transcription of handwritten documents is an important research topic for multiple applications, such as document classification or information extraction. In the case of historical documents, their transcription allows to preserve cultural heritage because of the amount of historical data contained in those documents. The transcription process can employ state-ofthe-art handwritten text recognition systems in order to obtain an initial transcription. This transcription is usually not good enough for the quality standards, but that may speed up the final transcription of the expert. In this framework, the use of collaborative transcription applications (crowdsourcing) has risen in the recent years, but these platforms are mainly limited by the use of non-mobile devices. Thus, the recruiting initiatives get reduced to a smaller set of potential volunteers. In this work, an alternative that allows the use of mobile devices is presented. The proposal consists of using speech dictation of handwritten text lines. Then, by using multimodal combination of speech and handwritten text images, a draft transcription can be obtained, presenting more quality than that obtained by only using handwritten text recognition. The speech dictation platform is implemented as a mobile device application, which allows for a wider range of population for recruiting volunteers. A real acquisition on the contents of a Spanish historical handwritten book was obtained with the platform. This data was used to perform experiments on the behaviour of the proposed framework. Some experiments were performed to study how to optimise the collaborators effort in terms of number of collaborations, including how many lines and which lines should be selected for the speech dictation.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Crowdsourcing Frameworkmentioning

confidence: 99%

“…The language interpolation module builds a statistical language model conditioned on a sample x as follows [1]:…”

Section: A Language Model Interpolationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Granell

Martínez-Hinarejos

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…However, bimodal combination in continuous decoding is a hard problem because of time asynchrony between the two signals, i.e., the sequence of feature vectors for each modality differs in length and it is not easy to find the time points where the same elements (words in this case) are synchronised. An initial approximation for this case was presented in [1].…”

Section: Introductionmentioning

confidence: 99%

Multimodal Output Combination for Transcribing Historical Handwritten Documents

Granell

Martínez-Hinarejos

2015

Computer Analysis of Images and Patterns

Self Cite

View full text Add to dashboard Cite

Abstract. Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of 14.3% and 16.6%, respectively, over the HTR baseline.

show abstract