Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Liu, Xiaojing; Gao, Feiyu; Zhang, Qiong; Zhao, Huasha

doi:10.18653/v1/n19-2005

Cited by 149 publications

(103 citation statements)

References 29 publications

(22 reference statements)

Supporting

Mentioning

101

Contrasting

Order By: Relevance

“…The approaches most similar to ours are Gra-phIE (Qian et al, 2019) and the approach by Liu et al (2019). Both approaches involve constructing a graph of text fields with edges representing horizontal and vertical adjacency, followed by an application of a GCN.…”

Section: Related Workmentioning

confidence: 99%

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Lockard

Shiralkar

Dong

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

In many documents, such as semi-structured webpages, textual semantics are augmented with additional information conveyed using visual elements including layout, font size, and color. Prior work on information extraction from semi-structured websites has required learning an extraction model specific to a given template via either manually labeled or distantly supervised data from that template. In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously unseen template, including from websites with little overlap with existing sources of knowledge for distant supervision and websites in entirely new subject verticals. Our model uses a graph neural network-based approach to build a rich representation of text fields on a webpage and the relationships between them, enabling generalization to new templates. Experiments show this approach provides a 31% F1 gain over a baseline for zero-shot extraction in a new subject vertical.

show abstract

Section: Related Workmentioning

confidence: 99%

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Lockard

Shiralkar

Dong

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Early methods apply markov random fields [7] or conditional random fields [2] to solve the problem. More recent approaches tend to use CNN [14,22,30], GNN [10], and Bert [28] to improve performance. Despite their good performance, these methods require large amounts of training data which is hard to collect due to privacy reasons.…”

Section: Text Field Labelingmentioning

confidence: 99%

“…The task aims to assign a label to each text region in a document so that text information could be extracted in structured formats. Learning based methods [2,10,14,22,30] are shown to have good performance for text field labeling. They could automatically adapt to any type of layouts, but they usually require sufficient training data.…”

Section: Introductionmentioning

confidence: 99%

One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction

Cheng

Qiu

Shi

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Structured information extraction from document images usually consists of three steps: text detection, text recognition, and text field labeling. While text detection and text recognition have been heavily studied and improved a lot in literature, text field labeling is less explored and still faces many challenges. Existing learning based methods for text labeling task usually require a large amount of labeled examples to train a specific model for each type of document. However, collecting large amounts of document images and labeling them is difficult and sometimes impossible due to privacy issues. Deploying separate models for each type of document also consumes a lot of resources. Facing these challenges, we explore one-shot learning for the text field labeling task. Existing one-shot learning methods for the task are mostly rule-based and have difficulty in labeling fields in crowded regions with few landmarks and fields consisting of multiple separate text regions. To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling, which makes use of attention mechanism to transfer the layout information between document images. We further applied conditional random field on the transferred layout information for the refinement of field labeling. We collected and annotated a real-world one-shot field labeling dataset with a large variety of document types and conducted extensive experiments to examine the effectiveness of the proposed model. To stimulate research in this direction, the collected dataset and the one-shot model will be released 1. CCS CONCEPTS • Applied computing → Document analysis; Optical character recognition; • Computing methodologies → Visual contentbased indexing and retrieval.

show abstract

“…BERTgrid (Denk and Reisswig, 2019) is nearly identical, but it replaces the one-hot character encoding with the word's BERT encoding. Liu et al (2019) represent a document as a fully-connected graph where text boxes are nodes. The edge embedding between two nodes incorporates the distance between them, the text boxes' aspect ratios, and their relative sizes.…”

Section: Related Workmentioning

confidence: 99%

Layout-Aware Text Representations Harm Clustering Documents by Type

Finegan-Dollak¹,

Verma²

2020

Proceedings of the First Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Clustering documents by type-grouping invoices with invoices and articles with articles-is a desirable first step for organizing large collections of document scans. Humans approaching this task use both the semantics of the text and the document layout to assist in grouping like documents. Lay-outLM (Xu et al., 2019), a layout-aware transformer built on top of BERT with state-of-theart performance on document-type classification, could reasonably be expected to outperform regular BERT (Devlin et al., 2018) for document-type clustering. However, we find experimentally that BERT significantly outperforms LayoutLM on this task (p < 0.001). We analyze clusters to show where layout awareness is an asset and where it is a liability.

show abstract

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Cited by 149 publications

References 29 publications

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction

Layout-Aware Text Representations Harm Clustering Documents by Type

Contact Info

Product

Resources

About