Proceedings of the 14th ACM International Conference on Information and Knowledge Management 2005
DOI: 10.1145/1099554.1099678
|View full text |Cite
|
Sign up to set email alerts
|

Predicting accuracy of extracting information from unstructured text collections

Abstract: Exploiting lexical and semantic relationships in large unstructured text collections can significantly enhance managing, integrating, and querying information locked in unstructured text. Most notably, named entities and relations between entities are crucial for effective question answering and other information retrieval and knowledge management tasks. Unfortunately, the success in extracting these relationships can vary for different domains, languages, and document collections. Predicting extraction perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
31
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(32 citation statements)
references
References 23 publications
1
31
0
Order By: Relevance
“…Resource selection approaches generally consist of two steps: (1) build a compact, representative collection summary (e.g., consisting of word frequency vectors [11,15] or document samples [30,32]); (2) relevance estimation: to process a given query, use the collection descriptors to estimate the number of topically relevant documents in each collection, and rank the collections accordingly. Unlike in distributed IR, our IE scenario requires that we identify collections with useful documents for the IE task, rather than collections with documents that are topically relevant to a given query.…”
Section: Problem Definitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Resource selection approaches generally consist of two steps: (1) build a compact, representative collection summary (e.g., consisting of word frequency vectors [11,15] or document samples [30,32]); (2) relevance estimation: to process a given query, use the collection descriptors to estimate the number of topically relevant documents in each collection, and rank the collections accordingly. Unlike in distributed IR, our IE scenario requires that we identify collections with useful documents for the IE task, rather than collections with documents that are topically relevant to a given query.…”
Section: Problem Definitionmentioning
confidence: 99%
“…Earlier efforts to identify collections for an extraction task (e.g., [1,21]) have focused on examining the quality of the extraction output, rather than its volume. The (complementary) methods described in this paper can be adapted to consider quality (see Section 6).…”
Section: Problem Definitionmentioning
confidence: 99%
“…Related effort to this paper is [1], which presents an approach to examine the quality of a relation that could be generated using an extraction system over a text database. Specifically, [1] builds language models for a text database and compares them against those for an extraction system to examine the relation quality.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, [1] builds language models for a text database and compares them against those for an extraction system to examine the relation quality. Our proposed algorithms are comparatively lightweight in that we eliminate the need for any such (potentially expensive) text analysis or the need for any apriori database-or extraction-related knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…To discover relations among two named entities, a number of works [2], [3], [22] proposed methods to identify relations using context words between them. In [23], Agichtein and Cucerzan claimed that relation extraction from text documents was a harder task than named entity recognition. They proposed a general language modeling method for quantifying the difficulty of information extraction by predicting performance of named entity recognition such as location, organization, person name and miscellaneous named entities, and relation extraction such as birth dates, death dates and invention name.…”
Section: Introductionmentioning
confidence: 99%