Combining handwriting and speech recognition for transcribing historical handwritten documents

Granell, Emilio; Martínez-Hinarejos, Carlos D.

doi:10.1109/icdar.2015.7333739

Cited by 10 publications

(15 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 9 draws the baseline values for both modalities and the evolution of the system and ASR outputs for the whole test ASR corpus (CE = 1350) without reliability verification. As can be observed, the language model interpolation permits to reduce the error level in the next speech decoding process [1], and the combination with the speech decoding results allows the system output to converge to a better hypothesis with less errors to correct [7]. Besides, the ASR performance is considerably improved reducing the average WER baseline value (60.5%±1.3) to 33.9%±4.8.…”

Section: B Preliminary Experimentsmentioning

confidence: 86%

“…A similar approximation is that presented in [1], where speech or handwritten recognition results, in the form of word-graphs, are used to enhance the language model for recognising with the other modality. An alternative that does not use language model enhancement is proposed in [7] where Confusion Network combination (similar to that of [13]) is used for the combination of these two modalities.…”

Section: Related Workmentioning

confidence: 99%

“…This transcription will be provided to a paleographer to obtain the final quality transcription with the lowest effort. The framework is mainly based on two ideas: using the current system output to obtain an adapted language model that can be employed in the next decoding step [1], and combining the decoding outputs of the two modalities to obtain a final output with less errors [7].…”

Section: Crowdsourcing Frameworkmentioning

confidence: 99%

“…This framework employs the bimodal Confusion Network combination method defined in [7], [8]. Specifically, starting from the system and the speech decoding outputs in CN format, the following steps are taken: 1) Anchor subnetworks are searched in order to align the subnetworks of both Confusion Networks.…”

Section: B Multimodal Combinationmentioning

confidence: 99%

See 3 more Smart Citations

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Granell

Martínez-Hinarejos

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-Transcription of handwritten documents is an important research topic for multiple applications, such as document classification or information extraction. In the case of historical documents, their transcription allows to preserve cultural heritage because of the amount of historical data contained in those documents. The transcription process can employ state-ofthe-art handwritten text recognition systems in order to obtain an initial transcription. This transcription is usually not good enough for the quality standards, but that may speed up the final transcription of the expert. In this framework, the use of collaborative transcription applications (crowdsourcing) has risen in the recent years, but these platforms are mainly limited by the use of non-mobile devices. Thus, the recruiting initiatives get reduced to a smaller set of potential volunteers. In this work, an alternative that allows the use of mobile devices is presented. The proposal consists of using speech dictation of handwritten text lines. Then, by using multimodal combination of speech and handwritten text images, a draft transcription can be obtained, presenting more quality than that obtained by only using handwritten text recognition. The speech dictation platform is implemented as a mobile device application, which allows for a wider range of population for recruiting volunteers. A real acquisition on the contents of a Spanish historical handwritten book was obtained with the platform. This data was used to perform experiments on the behaviour of the proposed framework. Some experiments were performed to study how to optimise the collaborators effort in terms of number of collaborations, including how many lines and which lines should be selected for the speech dictation.

show abstract

Section: B Preliminary Experimentsmentioning

confidence: 86%

Section: Related Workmentioning

confidence: 99%

Section: Crowdsourcing Frameworkmentioning

confidence: 99%

Section: B Multimodal Combinationmentioning

confidence: 99%

See 2 more Smart Citations

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Granell

Martínez-Hinarejos

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…This can be done using combination techniques based in confusion networks, such as that presented in [16] …”

Section: Discussionmentioning

confidence: 99%

Using the MGGI Methodology for Category-Based Language Modeling in Handwritten Marriage Licenses Books

Romero

Fornés

Vidal

et al. 2016

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)

View full text Add to dashboard Cite

IEEERomero Gómez, V.; Fornes, A.; Vidal Ruiz, E.; Sánchez Peiró, JA. (2016) Abstract-Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.

show abstract

Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts

Granell

Martínez-Hinarejos

2016

Advances in Speech and Language Technologies for Iberian Languages

View full text Add to dashboard Cite

Combining handwriting and speech recognition for transcribing historical handwritten documents

Cited by 10 publications

References 11 publications

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Using the MGGI Methodology for Category-Based Language Modeling in Handwritten Marriage Licenses Books

Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts

Contact Info

Product

Resources

About