Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-demo.31
|View full text |Cite
|
Sign up to set email alerts
|

PAWLS: PDF Annotation With Labels and Structure

Abstract: Adobe's Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information contained within PDF documents for training models or data analysis, because annotating these documents is difficult. In this paper, we present PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format. PAWLS is particularly suited for mixed-mode annota… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…S2-VL is manually labeled by graduate students who frequently read scientific papers. Using the PAWLS annotation tool (Neumann et al, 2021), annotators draw rectangular text blocks directly on each PDF page, and specify the block-level semantic categories from 15 possible candidates. 7 Tokens within a group can therefore inherit the category from the parent text block.…”
Section: S2-vlmentioning
confidence: 99%
“…S2-VL is manually labeled by graduate students who frequently read scientific papers. Using the PAWLS annotation tool (Neumann et al, 2021), annotators draw rectangular text blocks directly on each PDF page, and specify the block-level semantic categories from 15 possible candidates. 7 Tokens within a group can therefore inherit the category from the parent text block.…”
Section: S2-vlmentioning
confidence: 99%
“…We have addressed these layout-centric pinch points in the OKN construction process in two ways. First, we created PAWLS (PDF Annotation with Labels and Structure), a new annotation tool designed for PDF documents (Neumann, Shen, and Skjonsberg 2021). PAWLS supports labeling span-based textual regions, free form visual bounding boxes, and easy authoring of n-ary relations among different visual elements (see Figure 1).…”
Section: Layout-aware Document Processingmentioning
confidence: 99%