2015 13th International Conference on Document Analysis and Recognition (ICDAR) 2015
DOI: 10.1109/icdar.2015.7333739
|View full text |Cite
|
Sign up to set email alerts
|

Combining handwriting and speech recognition for transcribing historical handwritten documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 11 publications
0
15
0
Order By: Relevance
“…Figure 9 draws the baseline values for both modalities and the evolution of the system and ASR outputs for the whole test ASR corpus (CE = 1350) without reliability verification. As can be observed, the language model interpolation permits to reduce the error level in the next speech decoding process [1], and the combination with the speech decoding results allows the system output to converge to a better hypothesis with less errors to correct [7]. Besides, the ASR performance is considerably improved reducing the average WER baseline value (60.5%±1.3) to 33.9%±4.8.…”
Section: B Preliminary Experimentsmentioning
confidence: 86%
See 3 more Smart Citations
“…Figure 9 draws the baseline values for both modalities and the evolution of the system and ASR outputs for the whole test ASR corpus (CE = 1350) without reliability verification. As can be observed, the language model interpolation permits to reduce the error level in the next speech decoding process [1], and the combination with the speech decoding results allows the system output to converge to a better hypothesis with less errors to correct [7]. Besides, the ASR performance is considerably improved reducing the average WER baseline value (60.5%±1.3) to 33.9%±4.8.…”
Section: B Preliminary Experimentsmentioning
confidence: 86%
“…A similar approximation is that presented in [1], where speech or handwritten recognition results, in the form of word-graphs, are used to enhance the language model for recognising with the other modality. An alternative that does not use language model enhancement is proposed in [7] where Confusion Network combination (similar to that of [13]) is used for the combination of these two modalities.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This can be done using combination techniques based in confusion networks, such as that presented in [16] …”
Section: Discussionmentioning
confidence: 99%