2018 IEEE International Conference on Big Data (Big Data) 2018
DOI: 10.1109/bigdata.2018.8622136
|View full text |Cite
|
Sign up to set email alerts
|

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

Abstract: Handling large corpuses of documents is of significant importance in many fields, no more so than in the areas of crime investigation and defence, where an organisation may be presented with a large volume of scanned documents which need to be processed in a finite time. However, this problem is exacerbated both by the volume, in terms of scanned documents and the complexity of the pages, which need to be processed. Often containing many different elements, which each need to be processed and understood. Text … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 34 publications
0
7
0
Order By: Relevance
“…To explore the advancement in TIE techniques, [57] and as encoder in attention mechanism outperformed others [56]. Although, these techniques are showing promising results, but diversity in data sources makes the system complex [55]. The effectiveness of these techniques for complex, diverse, high dimensional and heterogeneous datasets must be investigated.…”
Section: Text Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…To explore the advancement in TIE techniques, [57] and as encoder in attention mechanism outperformed others [56]. Although, these techniques are showing promising results, but diversity in data sources makes the system complex [55]. The effectiveness of these techniques for complex, diverse, high dimensional and heterogeneous datasets must be investigated.…”
Section: Text Recognitionmentioning
confidence: 99%
“…Unstructured big data comes with high dimensionality [16,18,66], diversity [55,124], dynamicity [32] and heterogeneity [33,131]. Dimensionality reduction [18] and semantic annotation [131] can further improve the IE performance of high dimensional and heterogeneous data respectively.…”
Section: Dimensionality and Heterogeneitymentioning
confidence: 99%
“…To the best of our knowledge, this is the first work reported on applying neural network model on mixed text recognition. We apply our postprocessing approach to the output of the pipeline proposed by [15] for mixed text recognition over IAM handwriting database [25] to show the effectiveness of neural network based natural language generation on the improvement of OCR accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…Table I shows total number of characters/tokens before and after cleaning the data and also total number of unique characters/tokens after cleaning data for both train files and test files (i.e. result of applying TMIXT [15] on IAM handwriting database for text recognition). The Vocabulary size, bolded in table I, shows the number of cleaned and unique characters/tokens (words) for character level and word level language models.…”
Section: A Data Set and Analysismentioning
confidence: 99%
See 1 more Smart Citation