2020
DOI: 10.48550/arxiv.2005.07105
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Abstract: In many documents, such as semi-structured webpages, textual semantics are augmented with additional information conveyed using visual elements including layout, font size, and color. Prior work on information extraction from semi-structured websites has required learning an extraction model specific to a given template via either manually labeled or distantly supervised data from that template. In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…Tanaka et al, as well as Chen et al, introduced web-oriented reading comprehension datasets VisualMRC [35] and WebSRC [4], respectively, requiring models to understand the spatial structure of webpages as well as the textual content to answer corresponding questions. At the same time, many approaches [27,32,39] employ graph neural networks to encode node relationships in webpages. Additionally, large language models [13] have been proven to possess strong webpage understanding capabilities via few-shot learning.…”
Section: Webpages Understandingmentioning
confidence: 99%
“…Tanaka et al, as well as Chen et al, introduced web-oriented reading comprehension datasets VisualMRC [35] and WebSRC [4], respectively, requiring models to understand the spatial structure of webpages as well as the textual content to answer corresponding questions. At the same time, many approaches [27,32,39] employ graph neural networks to encode node relationships in webpages. Additionally, large language models [13] have been proven to possess strong webpage understanding capabilities via few-shot learning.…”
Section: Webpages Understandingmentioning
confidence: 99%
“…Render-full [15] employs visual features to express the distances between node blocks rendered with the web browser. Visual distances are proven a good feature to encode the neighboring relationships among nodes [25] but this method requires the time-consuming rendering process and needs extra memory space to save the images, CSS, and JavaScripts that can easily be out-of-date. In specific, Render-full employs a sophisticated heuristic algorithm to compute the visual distances, which gives the best performance [15], compared to other variants Render-PL and Render-IP.…”
Section: Baseline Modelsmentioning
confidence: 99%
“…Relation extraction associates pairs of named entities and identifies a pre-defined relationship between them. Closed relation extraction defines a closed set of relation types including a special type indicating "no relation" while open relation extraction conducts a binary classification of whether there exists a relationship between the two entities [1,24,25,32,46]. Composite extraction aims to extract more complex concepts such as reviews, opinions, and sentiment mentions.…”
Section: Related Work 51 Web Information Extractionmentioning
confidence: 99%
See 1 more Smart Citation