2014
DOI: 10.1016/j.patrec.2012.11.007
|View full text |Cite
|
Sign up to set email alerts
|

An iterative multimodal framework for the transcription of handwritten historical documents

Abstract: The transcription of historical documents is one of the most interesting tasks in which Handwritten Text Recognition can be applied, due to its interest in humanities research. One alternative for transcribing the ancient manuscripts is the use of speech dictation by using Automatic Speech Recognition techniques. In the two alternatives similar models (Hidden Markov Models and n-grams) and decoding processes (Viterbi decoding) are employed, which allows a possible combination of the two modalities with little … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 14 publications
(19 citation statements)
references
References 28 publications
0
18
0
1
Order By: Relevance
“…The proposed approach performs OCR, calculates its confidence, and based on it takes the speech recognition result to make a combination and provide an alternative hypothesis. A similar approximation is that presented in [1], where speech or handwritten recognition results, in the form of word-graphs, are used to enhance the language model for recognising with the other modality. An alternative that does not use language model enhancement is proposed in [7] where Confusion Network combination (similar to that of [13]) is used for the combination of these two modalities.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…The proposed approach performs OCR, calculates its confidence, and based on it takes the speech recognition result to make a combination and provide an alternative hypothesis. A similar approximation is that presented in [1], where speech or handwritten recognition results, in the form of word-graphs, are used to enhance the language model for recognising with the other modality. An alternative that does not use language model enhancement is proposed in [7] where Confusion Network combination (similar to that of [13]) is used for the combination of these two modalities.…”
Section: Related Workmentioning
confidence: 99%
“…This transcription will be provided to a paleographer to obtain the final quality transcription with the lowest effort. The framework is mainly based on two ideas: using the current system output to obtain an adapted language model that can be employed in the next decoding step [1], and combining the decoding outputs of the two modalities to obtain a final output with less errors [7].…”
Section: Crowdsourcing Frameworkmentioning
confidence: 99%
See 2 more Smart Citations
“…However, bimodal combination in continuous decoding is a hard problem because of time asynchrony between the two signals, i.e., the sequence of feature vectors for each modality differs in length and it is not easy to find the time points where the same elements (words in this case) are synchronised. An initial approximation for this case was presented in [1].…”
Section: Introductionmentioning
confidence: 99%