2018
DOI: 10.14778/3231751.3231758
|View full text |Cite
|
Sign up to set email alerts
|

Ceres

Abstract: The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically generated labels, these methods are not sufficiently robust to succeed in settings with complex schemas and information-rich web… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 35 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…Since this approach requires training data for each template targeted for extraction, recent work has focused on reducing the manual work needed per site. Fonduer (Wu et al, 2018) provides an interface for easily creating training data, Vertex (Gulhane et al, 2011) uses semi-supervision to minimize the number of labels needed, LODIE (Gentile et al, 2015) and Ceres (Lockard et al, 2018) automatically generate training data based on distant supervision, and DIADEM (Furche et al, 2014) identifies matching rules for specific entity types.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Since this approach requires training data for each template targeted for extraction, recent work has focused on reducing the manual work needed per site. Fonduer (Wu et al, 2018) provides an interface for easily creating training data, Vertex (Gulhane et al, 2011) uses semi-supervision to minimize the number of labels needed, LODIE (Gentile et al, 2015) and Ceres (Lockard et al, 2018) automatically generate training data based on distant supervision, and DIADEM (Furche et al, 2014) identifies matching rules for specific entity types.…”
Section: Related Workmentioning
confidence: 99%
“…For this work, we assume the page topic entity has already been identified, (such as by the method proposed by Lockard et al (2018) or by using the HTML title tag) and thus limit ourselves to identifying the objects and corresponding relations. We consider the following two settings: Relation Extraction (ClosedIE): Let R define a closed set of relation types, including a special type indicating "No Relation".…”
Section: Relation Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, in practice, active-learning methods not only constantly require human effort in building specialized annotation tools [24] but also need humans to label samples for each new site of interest. A recent work by Lockard et al [25] attempted to use additional knowledge bases as distant supervision to automatically label some samples in the target websites and then learn a machine learning model on such noisy-labeled data. A large and comprehensive knowledge base is not always accessible and available for every domain.…”
Section: Related Workmentioning
confidence: 99%