Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.721
|View full text |Cite
|
Sign up to set email alerts
|

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Abstract: In many documents, such as semi-structured webpages, textual semantics are augmented with additional information conveyed using visual elements including layout, font size, and color. Prior work on information extraction from semi-structured websites has required learning an extraction model specific to a given template via either manually labeled or distantly supervised data from that template. In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(40 citation statements)
references
References 20 publications
(26 reference statements)
0
29
0
Order By: Relevance
“…cannot be fully exploited. To this end, the second direction relies on the deep fusion among textual, visual, and layout information from a great number of unlabeled documents in different domains, where pre-training techniques play an important role in learning the cross-modality interaction in an end-to-end fashion (Lockard et al, 2020;. In this way, the pre-trained models absorb cross-modal knowledge from different document types, where the local invariance among these layouts and styles is preserved.…”
Section: Introductionmentioning
confidence: 99%
“…cannot be fully exploited. To this end, the second direction relies on the deep fusion among textual, visual, and layout information from a great number of unlabeled documents in different domains, where pre-training techniques play an important role in learning the cross-modality interaction in an end-to-end fashion (Lockard et al, 2020;. In this way, the pre-trained models absorb cross-modal knowledge from different document types, where the local invariance among these layouts and styles is preserved.…”
Section: Introductionmentioning
confidence: 99%
“…Besides, graph neural networks are also widely used for event extraction (Liu et al, 2018;Balali et al, 2020;Zhang et al, 2021) and relation and entity extraction (Zhang et al, 2018;Sun et al, 2020). Graph neural networks also demonstrate effectiveness to encode other types of intrinsic structures of a sentence, such as knowledge graph (Zhang et al, 2019a;, document-level relations (Sahu et al, 2019;Lockard et al, 2020;, and selfconstructed graphs (Kim and Lee, 2012;Zhu et al, 2019;Qian et al, 2019;Sahu et al, 2020). However, all these approaches focus on single IE tasks while can not scale to extracting a joint information network with entities, relations, and events.…”
Section: Related Workmentioning
confidence: 99%
“…The recent surge of interest in automatic information extraction from semi-structued documents are well reflected in their increased number of publication record from both research community and industry (Katti et al, 2018;Qian et al, 2019;Liu et al, 2019;Denk and Reisswig, 2019;Hwang et al, 2019;Xu 1 https://github.com/clovaai/spade Jaume et al, 2019;Zhong et al, 2019;Rausch et al, 2019;Yu et al, 2020;Majumder et al, 2020;Lockard et al, 2020;Garncarek et al, 2020;Lin et al, 2020;Xu et al, 2020;Powalski et al, 2021;Wang et al, 2021;Hong et al, 2021;. Below, we summarize some of closely related works published before the major development of SPADE .…”
Section: Related Workmentioning
confidence: 99%
“…On the contrary, SPADE predicts both the intrabox relationship and the inter-box relationship by constructing a dependency graph among the tokens. Lockard et al (2019Lockard et al ( , 2020 also utilize a graph to extract semantic relation from semi-structrued web-page. The graph is constructed based on rules from "structured html DOM" and mainly used for information encoding.…”
Section: Related Workmentioning
confidence: 99%