2010
DOI: 10.14778/1920841.1920916
|View full text |Cite
|
Sign up to set email alerts
|

Automatic rule refinement for information extraction

Abstract: Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
41
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 31 publications
(41 citation statements)
references
References 21 publications
0
41
0
Order By: Relevance
“…Specifically, view update naturally arises when debugging Information Extraction (IE) programs, which can be highly complicated [23]. As a concrete example, the MIDAS system [1] extracts basic relations from multiple (publicly available) financial data sources, some of which are semistructured or just text, and integrates them into composite entities, events and relationships.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, view update naturally arises when debugging Information Extraction (IE) programs, which can be highly complicated [23]. As a concrete example, the MIDAS system [1] extracts basic relations from multiple (publicly available) financial data sources, some of which are semistructured or just text, and integrates them into composite entities, events and relationships.…”
Section: Introductionmentioning
confidence: 99%
“…When the integration query is taken as the view definition, deletion propagation becomes the task of suggesting tuples to be deleted from the base relations for eliminating the erroneous conclusion, while minimizing the effect on the remaining conclusions. Furthermore, eliminating tuples from the base relations may itself entail deletion propagation, since these tuples are typically extracted by consulting external (possibly unclean) data sources [23,25].…”
Section: Introductionmentioning
confidence: 99%
“…Approaches for refining rule-based information extraction programs have been recently proposed in [34,7,26]. Shen et al [34] propose an approach for refining rules by posing a series of template questions to the user, where each question asks for additional information about a specific (predefined) feature of the desired extracted data, whereas Chai et al [7] allow users to update any (incorrect) intermediate result derived by the system and proposes techniques for incorporating these updates during program execution.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, we develop techniques to automatically compute a (small) set of dictionary entries, therefore allowing the user to focus on a (small) set of base tuples whose removal results in highest quality improvements for the extractor. Liu et al [26] proposed a provenance-based framework for refining information extraction rules. They showed how to use provenance to compute high-level changes, a specific intermediate result whose removal from the output of an operator causes the removal of a false positive from the result, and how multiple high-level changes can be realized via a low-level change: a concrete change to the operator that removes one or more intermediate results from the output of the operator.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation