Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-2005
|View full text |Cite
|
Sign up to set email alerts
|

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Abstract: Visually rich documents (VRDs) are ubiquitous in daily business and life. Examples are purchase receipts, insurance policy documents, custom declaration forms and so on. In VRDs, visual and layout information is critical for document understanding, and texts in such documents cannot be serialized into the one-dimensional sequence without losing information. Classic information extraction models such as BiLSTM-CRF typically operate on text sequences and do not incorporate visual features. In this paper, we intr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
101
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 149 publications
(103 citation statements)
references
References 29 publications
(22 reference statements)
2
101
0
Order By: Relevance
“…The approaches most similar to ours are Gra-phIE (Qian et al, 2019) and the approach by Liu et al (2019). Both approaches involve constructing a graph of text fields with edges representing horizontal and vertical adjacency, followed by an application of a GCN.…”
Section: Related Workmentioning
confidence: 99%
“…The approaches most similar to ours are Gra-phIE (Qian et al, 2019) and the approach by Liu et al (2019). Both approaches involve constructing a graph of text fields with edges representing horizontal and vertical adjacency, followed by an application of a GCN.…”
Section: Related Workmentioning
confidence: 99%
“…Early methods apply markov random fields [7] or conditional random fields [2] to solve the problem. More recent approaches tend to use CNN [14,22,30], GNN [10], and Bert [28] to improve performance. Despite their good performance, these methods require large amounts of training data which is hard to collect due to privacy reasons.…”
Section: Text Field Labelingmentioning
confidence: 99%
“…The task aims to assign a label to each text region in a document so that text information could be extracted in structured formats. Learning based methods [2,10,14,22,30] are shown to have good performance for text field labeling. They could automatically adapt to any type of layouts, but they usually require sufficient training data.…”
Section: Introductionmentioning
confidence: 99%
“…BERTgrid (Denk and Reisswig, 2019) is nearly identical, but it replaces the one-hot character encoding with the word's BERT encoding. Liu et al (2019) represent a document as a fully-connected graph where text boxes are nodes. The edge embedding between two nodes incorporates the distance between them, the text boxes' aspect ratios, and their relative sizes.…”
Section: Related Workmentioning
confidence: 99%