Proceedings of the 4th International Workshop on Multilingual OCR 2013
DOI: 10.1145/2505377.2505392
|View full text |Cite
|
Sign up to set email alerts
|

Ruling-based table analysis for noisy handwritten documents

Abstract: ABSTRACT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 21 publications
0
1
0
Order By: Relevance
“…2) Datasets like PC32 (Lin et al, 2020) fail to accurately reflect the real-world distribution of data, with high-resource languages like French and German disproportionately represented by 40 million and 4 million sentences, respectively. 3) Linguistic diversity, a critical factor, is often overlooked in datasets such as Europarl (Koehn, 2005) and Mul-tiUN (Chen and Eisele, 2012). 4) Lastly, systematic zero-shot NMT evaluations are rarely found in existing MNMT datasets, either missing entirely or covering less than 1% of possible zero-shot combinations (Aharoni et al, 2019;Pan et al, 2021;Tang et al, 2021).…”
Section: Ec40 Datasetmentioning
confidence: 99%
“…2) Datasets like PC32 (Lin et al, 2020) fail to accurately reflect the real-world distribution of data, with high-resource languages like French and German disproportionately represented by 40 million and 4 million sentences, respectively. 3) Linguistic diversity, a critical factor, is often overlooked in datasets such as Europarl (Koehn, 2005) and Mul-tiUN (Chen and Eisele, 2012). 4) Lastly, systematic zero-shot NMT evaluations are rarely found in existing MNMT datasets, either missing entirely or covering less than 1% of possible zero-shot combinations (Aharoni et al, 2019;Pan et al, 2021;Tang et al, 2021).…”
Section: Ec40 Datasetmentioning
confidence: 99%
“…Ziemski et al (2016) created the United Nations Parallel Corpus, which consists of over 2 million words of parallel texts in 6 official languages, including English and Arabic. Another work that includes Arabic is the multilingual parallel corpus MultiUN (Chen and Eisele, 2012). It extends the United Nations Parallel Corpus by including texts from various sources such as the United Nations and other international organisations.…”
Section: Related Workmentioning
confidence: 99%
“…Chen and Lopresti [2] use a probabilistic alternative of Hough transform to detect lines in the document. In order to ensure high recall of table rulings, some lines are excluded based on the fact that the table ruling lines are parallel or orthogonal.…”
Section: Introductionmentioning
confidence: 99%