Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data 2013
DOI: 10.1145/2463676.2465284
|View full text |Cite
|
Sign up to set email alerts
|

Provenance-based dictionary refinement in information extraction

Abstract: Dictionaries of terms and phrases (e.g. common person or organization names) are integral to information extraction systems that extract structured information from unstructured text. Using noisy or unrefined dictionaries may lead to many incorrect results even when highly precise and sophisticated extraction rules are used. In general, the results of the system are dependent on dictionary entries in arbitrary complex ways, and removal of a set of entries can remove both correct and incorrect results. Further,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Provenance-based techniques have also been applied to information extraction problems. Roy et al [15] propose a provenance-based technique to improve the quality of extraction by refining the dictionaries that are used in a rule-based extraction system. A set of entries from the dictionaries that have been involved in generating the output are analyzed to determine which should be removed to improve the extractor's performance most.…”
Section: Related Workmentioning
confidence: 99%
“…Provenance-based techniques have also been applied to information extraction problems. Roy et al [15] propose a provenance-based technique to improve the quality of extraction by refining the dictionaries that are used in a rule-based extraction system. A set of entries from the dictionaries that have been involved in generating the output are analyzed to determine which should be removed to improve the extractor's performance most.…”
Section: Related Workmentioning
confidence: 99%