Proceedings of the First Workshop on Scholarly Document Processing 2020
DOI: 10.18653/v1/2020.sdp-1.9
|View full text |Cite
|
Sign up to set email alerts
|

Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain

Abstract: We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…What is more, inpainted backgrounds are only required if highlighting detection is desired: For text-only alignment, plain scans are sufficient. The actual highlighting extraction works as follows (see Müller et al (2020) for details): Since document highlighting comes mostly in strong colors, which are characterized by large differences among their three component values in the RGB color model, we create a binarized version of each page by going over all pixels in the background image and setting each pixel to 1 if the pairwise differences between the R, G, and B components are above a certain threshold (50), and to 0 otherwise. This yields an image with regions of higher and lower density of black pixels.…”
Section: Highlighting Detectionmentioning
confidence: 99%
“…What is more, inpainted backgrounds are only required if highlighting detection is desired: For text-only alignment, plain scans are sufficient. The actual highlighting extraction works as follows (see Müller et al (2020) for details): Since document highlighting comes mostly in strong colors, which are characterized by large differences among their three component values in the RGB color model, we create a binarized version of each page by going over all pixels in the background image and setting each pixel to 1 if the pairwise differences between the R, G, and B components are above a certain threshold (50), and to 0 otherwise. This yields an image with regions of higher and lower density of black pixels.…”
Section: Highlighting Detectionmentioning
confidence: 99%