ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Lockard, Colin; Shiralkar, Prashant; Dong, Xin Luna; Hajishirzi, Hannaneh

doi:10.18653/v1/2020.acl-main.721

Cited by 36 publications

(40 citation statements)

References 20 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…cannot be fully exploited. To this end, the second direction relies on the deep fusion among textual, visual, and layout information from a great number of unlabeled documents in different domains, where pre-training techniques play an important role in learning the cross-modality interaction in an end-to-end fashion (Lockard et al, 2020;. In this way, the pre-trained models absorb cross-modal knowledge from different document types, where the local invariance among these layouts and styles is preserved.…”

Section: Introductionmentioning

confidence: 99%

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

Xu¹,

Xu²,

Lv³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

215

192

View full text Add to dashboard Cite

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. We propose LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. Specifically, with a two-stream multi-modal Transformer encoder, LayoutLMv2 uses not only the existing masked visual-language modeling task but also the new text-image alignment and text-image matching tasks, which make it better capture the cross-modality interaction in the pre-training stage. Meanwhile, it also integrates a spatial-aware self-attention mechanism into the Transformer architecture so that the model can fully understand the relative positional relationship among different text blocks. Experiment results show that LayoutLMv2 outperforms LayoutLM by a large margin and achieves new state-ofthe-art results on a wide variety of downstream visually-rich document understanding tasks, including FUNSD (0.7895 → 0.8420), CORD (0.9493 → 0.9601), SROIE (0.9524 → 0.9781), Kleister-NDA (0.8340 → 0.8520), RVL-CDIP (0.9443 → 0.9564), and DocVQA (0.7295 → 0.8672). We made our model and code publicly available at https://aka.ms /layoutlmv2.

show abstract

Section: Introductionmentioning

confidence: 99%

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

Xu¹,

Xu²,

Lv³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

215

192

View full text Add to dashboard Cite

show abstract

“…Besides, graph neural networks are also widely used for event extraction (Liu et al, 2018;Balali et al, 2020;Zhang et al, 2021) and relation and entity extraction (Zhang et al, 2018;Sun et al, 2020). Graph neural networks also demonstrate effectiveness to encode other types of intrinsic structures of a sentence, such as knowledge graph (Zhang et al, 2019a;, document-level relations (Sahu et al, 2019;Lockard et al, 2020;, and selfconstructed graphs (Kim and Lee, 2012;Zhu et al, 2019;Qian et al, 2019;Sahu et al, 2020). However, all these approaches focus on single IE tasks while can not scale to extracting a joint information network with entities, relations, and events.…”

Section: Related Workmentioning

confidence: 99%

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction

Zhang¹,

Ji²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

The tasks of Rich Semantic Parsing, such as Abstract Meaning Representation (AMR), share similar goals with Information Extraction (IE) to convert natural language texts into structured semantic representations. To take advantage of such similarity, we propose a novel AMR-guided framework for joint information extraction to discover entities, relations, and events with the help of a pre-trained AMR parser. Our framework consists of two novel components: 1) an AMR based semantic graph aggregator to let the candidate entity and event trigger nodes collect neighborhood information from AMR graph for passing message among related knowledge elements; 2) an AMR guided graph decoder to extract knowledge elements based on the order decided by the hierarchical structures in AMR. Experiments on multiple datasets have shown that the AMR graph encoder and decoder have provided significant gains and our approach has achieved new state-of-the-art performance on all IE subtasks 1 .

show abstract

“…The recent surge of interest in automatic information extraction from semi-structued documents are well reflected in their increased number of publication record from both research community and industry (Katti et al, 2018;Qian et al, 2019;Liu et al, 2019;Denk and Reisswig, 2019;Hwang et al, 2019;Xu 1 https://github.com/clovaai/spade Jaume et al, 2019;Zhong et al, 2019;Rausch et al, 2019;Yu et al, 2020;Majumder et al, 2020;Lockard et al, 2020;Garncarek et al, 2020;Lin et al, 2020;Xu et al, 2020;Powalski et al, 2021;Wang et al, 2021;Hong et al, 2021;. Below, we summarize some of closely related works published before the major development of SPADE .…”

Section: Related Workmentioning

confidence: 99%

“…On the contrary, SPADE predicts both the intrabox relationship and the inter-box relationship by constructing a dependency graph among the tokens. Lockard et al (2019Lockard et al ( , 2020 also utilize a graph to extract semantic relation from semi-structrued web-page. The graph is constructed based on rules from "structured html DOM" and mainly used for information encoding.…”

Section: Related Workmentioning

confidence: 99%

Spatial Dependency Parsing for Semi-Structured Document Information Extraction

Hwang¹,

Yim²,

Park³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Information Extraction (IE) for semistructured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories. However, such problem setup has two inherent limitations that (1) it cannot easily handle complex spatial relationships and (2) it is not suitable for highly structured information, which are nevertheless frequently observed in real-world document images. To tackle these issues, we first formulate the IE task as spatial dependency parsing problem that focuses on the relationship among text tokens in the documents. Under this setup, we then propose SPADE (SPAtial DEpendency parser) that models highly complex spatial relationships and an arbitrary number of information layers in the documents in an end-to-end manner. We evaluate it on various kinds of documents such as receipts, name cards, forms, and invoices, and show that it achieves a similar or better performance compared to strong baselines including BERT-based IOB taggger.

show abstract

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Cited by 36 publications

References 20 publications

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction

Spatial Dependency Parsing for Semi-Structured Document Information Extraction

Contact Info

Product

Resources

About