2015
DOI: 10.1007/978-3-319-23192-1_21
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Output Combination for Transcribing Historical Handwritten Documents

Abstract: Abstract. Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 11 publications
(9 reference statements)
0
6
0
Order By: Relevance
“…We assume, however, that for a music practitioner it would be, at least, more appealing to play a composition reading a music sheet rather than manually transcribing it. Note that we find the same scenario in the field of Handwritten Text Recognition, where producing a uttering out of a written text and using a speech recognition system for then fusing the decisions required less effort than manually transcribing the text or correcting the errors produced by the text recognition system [8].…”
Section: Introductionmentioning
confidence: 53%
“…We assume, however, that for a music practitioner it would be, at least, more appealing to play a composition reading a music sheet rather than manually transcribing it. Note that we find the same scenario in the field of Handwritten Text Recognition, where producing a uttering out of a written text and using a speech recognition system for then fusing the decisions required less effort than manually transcribing the text or correcting the errors produced by the text recognition system [8].…”
Section: Introductionmentioning
confidence: 53%
“…This framework employs the bimodal Confusion Network combination method defined in [7], [8]. Specifically, starting from the system and the speech decoding outputs in CN format, the following steps are taken: 1) Anchor subnetworks are searched in order to align the subnetworks of both Confusion Networks.…”
Section: B Multimodal Combinationmentioning
confidence: 99%
“…Read4SpeechExperiments is an Android free software application designed to facilitate the speech acquisition from mobile devices. The source code is available on GitLab 8 , and it can be installed from the Google Play 9 and the F-Droid 10 platforms.…”
Section: B Crowdsourcing Speech Acquisitionmentioning
confidence: 99%
“…The multimodal paradigm has experimented a spectacular growth in the latest years because of the development of mobile devices (Di Fabbrizio et al, 2009), where different modalities (speech and touch mainly) are employed for the device management. In the case of Image or Natural Language Processing tasks, multimodality has been applied to problems where signals of different nature that represent the same final object are available (Mihalcea, 2012;Potamianos et al, 2003;Sebe et al, 2005;Granell and Martínez-Hinarejos, 2015b). In any case, multimodality is strongly linked to human-computer interaction, since the user may employ different modalities to obtain a more ergonomic or faster interaction to achieve an objective.…”
Section: Introductionmentioning
confidence: 99%