Acquisition of semantic patterns for information extraction from corpora

Kim, J.-T.; Moldovan, Dan

doi:10.1109/caia.1993.366645

Cited by 35 publications

(19 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…AutoSlog (Riloff 1993) and PALKA (Kim and Moldovan 1993) were the first IE pattern learning systems. AutoSlog (Riloff 1993;Riloff 1996a) matches a small set of syntactic templates against the text surrounding a desired extraction and creates one (or more) lexico-syntactic patterns by instantiating the templates with the corresponding words in the sentence.…”

Section: Supervised Learning Of Extraction Patterns and Rulesmentioning

confidence: 99%

“…A "human in the loop" must then manually review the patterns to decide which ones are appropriate for the IE task. PALKA (Kim and Moldovan 1993) uses manually defined frames and keywords that are provided by a user and creates IE patterns by mapping clauses containing the keywords onto the frame's slots. The patterns are generalized based on the semantic features of the words.…”

Section: Supervised Learning Of Extraction Patterns and Rulesmentioning

confidence: 99%

See 1 more Smart Citation

Information Extraction

2014

Encyclopedia of Social Network Analysis and Mining

View full text Add to dashboard Cite

Section: Supervised Learning Of Extraction Patterns and Rulesmentioning

confidence: 99%

Section: Supervised Learning Of Extraction Patterns and Rulesmentioning

confidence: 99%

Information Extraction

2014

Encyclopedia of Social Network Analysis and Mining

View full text Add to dashboard Cite

“…PALKA (Kim & Moldovan, 1993) uses an induction method similar to Mitchell's candidate elimination algorithm. PALKA is computationally intensive and was implemented on a parallel computer.…”

Section: Crystalmentioning

confidence: 99%

Untitled

Soderland

1999

Machine Learning

630

View full text Add to dashboard Cite

Abstract.A wealth of on-line text information can be made available to automatic processing by information extraction (IE) systems. Each IE application needs a separate set of rules tuned to the domain and writing style. WHISK helps to overcome this knowledge-engineering bottleneck by learning text extraction rules automatically.WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences. Such semi-structured text has largely been beyond the scope of previous systems. When used in conjunction with a syntactic analyzer and semantic tagging, WHISK can also handle extraction from free text such as news stories.Keywords: natural language processing, information extraction, rule learning Information extractionAs more and more text becomes available on-line, there is a growing need for systems that extract information automatically from text data. An information extraction (IE) system can serve as a front end for high precision information retrieval or text routing, as a first step in knowledge discovery systems that look for trends in massive amounts of text data, or as input to an intelligent agent whose actions depend on understanding the content of text-based information.IE systems have been developed for writing styles ranging from structured text with tabular information to free text such as news stories. A key element of such systems is a set of text extraction rules that identify relevant information to be extracted.For structured text, the rules specify a fixed order of relevant information and the labels or HTML tags that delimit strings to be extracted. For free text, an IE system needs several steps in addition to text extraction rules. These include syntactic analysis, semantic tagging, recognizers for domain objects such as person and company names, and discourse processing that makes inferences across sentence boundaries. Extraction rules for free text are typically based on patterns involving syntactic relations between words or semantic classes of words. Semi-structured textA useful class of text that falls between these extremes has been largely inaccessible to IE systems. Such semi-structured text 1 is ungrammatical and often telegraphic in style, but does

show abstract

“…Traditional Information Extraction (IE) systems train extractors for pre-specified relations (Kim and Moldovan, 1993). This approach cannot scale to the web, where target relations are not defined in advance.…”

Section: Introductionmentioning

confidence: 99%

ZORE: A Syntax-based System for Chinese Open Relation Extraction

Qiu

Zhang

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Open Relation Extraction (ORE) overcomes the limitations of traditional IE techniques, which train individual extractors for every single relation type. Systems such as ReVerb, PATTY, OLLIE, and Exemplar have attracted much attention on English ORE. However, few studies have been reported on ORE for languages beyond English. This paper presents a syntax-based Chinese (Zh) ORE system, ZORE, for extracting relations and semantic patterns from Chinese text. ZORE identifies relation candidates from automatically parsed dependency trees, and then extracts relations with their semantic patterns iteratively through a novel double propagation algorithm. Empirical results on two data sets show the effectiveness of the proposed system.

show abstract

Acquisition of semantic patterns for information extraction from corpora

Cited by 35 publications

References 9 publications

Information Extraction

Information Extraction

Untitled

ZORE: A Syntax-based System for Chinese Open Relation Extraction

Contact Info

Product

Resources

About