Ceres

Lockard, Colin; Dong, Xin Luna; Einolghozati, Arash; Shiralkar, Prashant

doi:10.14778/3231751.3231758

Cited by 35 publications

(4 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since this approach requires training data for each template targeted for extraction, recent work has focused on reducing the manual work needed per site. Fonduer (Wu et al, 2018) provides an interface for easily creating training data, Vertex (Gulhane et al, 2011) uses semi-supervision to minimize the number of labels needed, LODIE (Gentile et al, 2015) and Ceres (Lockard et al, 2018) automatically generate training data based on distant supervision, and DIADEM (Furche et al, 2014) identifies matching rules for specific entity types.…”

Section: Related Workmentioning

confidence: 99%

“…For this work, we assume the page topic entity has already been identified, (such as by the method proposed by Lockard et al (2018) or by using the HTML title tag) and thus limit ourselves to identifying the objects and corresponding relations. We consider the following two settings: Relation Extraction (ClosedIE): Let R define a closed set of relation types, including a special type indicating "No Relation".…”

Section: Relation Extractionmentioning

confidence: 99%

“…To circumvent manual data annotation, previous work used a distant supervision process requiring a knowledge base aligned to the website targeted for extraction (Gentile et al, 2015;Lockard et al, 2018), including for OpenIE extraction (Banko et al, 2007;Bronzi et al, 2013;Lockard et al, 2019). These methods, however, can only learn a website-specific model based on seed knowledge for the site, but cannot be generalized to the majority of websites with knowledge from new verticals, by long-tail specialists, and in different languages.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Lockard

Shiralkar

Dong

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

In many documents, such as semi-structured webpages, textual semantics are augmented with additional information conveyed using visual elements including layout, font size, and color. Prior work on information extraction from semi-structured websites has required learning an extraction model specific to a given template via either manually labeled or distantly supervised data from that template. In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously unseen template, including from websites with little overlap with existing sources of knowledge for distant supervision and websites in entirely new subject verticals. Our model uses a graph neural network-based approach to build a rich representation of text fields on a webpage and the relationships between them, enabling generalization to new templates. Experiments show this approach provides a 31% F1 gain over a baseline for zero-shot extraction in a new subject vertical.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Relation Extractionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

Lockard

Shiralkar

Dong

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, in practice, active-learning methods not only constantly require human effort in building specialized annotation tools [24] but also need humans to label samples for each new site of interest. A recent work by Lockard et al [25] attempted to use additional knowledge bases as distant supervision to automatically label some samples in the target websites and then learn a machine learning model on such noisy-labeled data. A large and comprehensive knowledge base is not always accessible and available for every domain.…”

Section: Related Workmentioning

confidence: 99%

FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents

Lin

Sheng

et al. 2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Extracting structured data from HTML documents is a long-studied problem with a broad range of applications like augmenting knowledge bases, supporting faceted search, and providing domain-specific experiences for key verticals like shopping and movies. Previous approaches have either required a small number of examples for each target site or relied on carefully handcrafted heuristics built over visual renderings of websites. In this paper, we present a novel two-stage neural approach, named FreeDOM, which overcomes both these limitations. The first stage learns a representation for each DOM node in the page by combining both the text and markup information. The second stage captures longer range distance and semantic relatedness using a relational neural network. By combining these stages, FreeDOM is able to generalize to unseen sites after training on a small number of seed sites from that vertical without requiring expensive hand-crafted features over visual renderings of the page. Through experiments on a public dataset with 8 different verticals, we show that FreeDOM beats the previous state of the art by nearly 3.7 F1 points on average without requiring features over rendered pages or expensive hand-crafted features.

show abstract

Best from Top k Versus Top 1: Improving Distant Supervision Relation Extraction with Deep Reinforcement Learning

Gui

Liu

et al. 2019

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Ceres

Cited by 35 publications

References 27 publications

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents

Best from Top k Versus Top 1: Improving Distant Supervision Relation Extraction with Deep Reinforcement Learning

Contact Info

Product

Resources

About