Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2014
DOI: 10.3115/v1/p14-1037
|View full text |Cite
|
Sign up to set email alerts
|

Zero-shot Entity Extraction from Web Pages

Abstract: In order to extract entities of a fine-grained category from semi-structured data in web pages, existing information extraction systems rely on seed examples or redundancy across multiple web pages. In this paper, we consider a new zero-shot learning task of extracting entities specified by a natural language query (in place of seeds) given only a single web page. Our approach defines a log-linear model over latent extraction predicates, which select lists of entities from the web page. The main challenge is t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(30 citation statements)
references
References 23 publications
0
30
0
Order By: Relevance
“…Ritter et al (2011) looked into recognizing entities from social media data that involves informal and potentially noisy texts. Pasupat and Liang (2014) looked into the issue of zero-shot entity extraction from Web pages with natural language queries where minimal supervision was used. Neelakantan and Collins (2014) looked into the problem of automatically constructing dictionaries with minimal supervision for improved named entity extraction.…”
Section: Related Workmentioning
confidence: 99%
“…Ritter et al (2011) looked into recognizing entities from social media data that involves informal and potentially noisy texts. Pasupat and Liang (2014) looked into the issue of zero-shot entity extraction from Web pages with natural language queries where minimal supervision was used. Neelakantan and Collins (2014) looked into the problem of automatically constructing dictionaries with minimal supervision for improved named entity extraction.…”
Section: Related Workmentioning
confidence: 99%
“…We extract a set of features from the web page for the event page classifier, including term frequency, web entity [27], anchors, URL segments and page title. we remove features that occur on less than 100 different domains to make sure the features are general enough and not specific to our training set.…”
Section: Event Extraction 41 Event Page Classifiermentioning
confidence: 99%
“…In this paradigm, facts from existing knowledge bases are paired with unlabeled documents to create noisy or “weakly” labeled training examples [1, 25, 26, 28]. In addition to existing knowledge bases, crowdsourcing [12] and heuristics from domain experts [29] have also proven to be effective weak supervision sources. In our work, we show that by incorporating all kinds of supervision in one framework in a noise-aware way, we are able to achieve high quality in knowledge base construction.…”
Section: Related Workmentioning
confidence: 99%