Automatic compensation for parser figure-of-merit flaws

Blaheta, Don; Charniak, Eugene

doi:10.3115/1034678.1034755

Cited by 6 publications

(6 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There we showed a nice speedup of the parser versus the default, while maintaining accuracy levels. However, internal heuristics of the Charniak search, such as attention shifting (Blaheta and Charniak, 1999;Hall and Johnson, 2004), can make this accuracy/efficiency tradeoff somewhat difficult to interpret. Furthermore, one might ask whether O(N 2 ) complexity is as good as can be achieved through the paradigm of using finite-state constraints to close chart cells.…”

Section: Introductionmentioning

confidence: 99%

Linear complexity context-free parsing pipelines via chart constraints

Roark

Hollingshead

2009

Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com

View full text Add to dashboard Cite

In this paper, we extend methods from Roark and Hollingshead (2008) for reducing the worst-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state tagging pre-processing. Methods from our previous paper achieved quadratic worst-case complexity. We prove here that alternate methods for choosing constraints can achieve either linear or O(N log 2 N) complexity. These worst-case bounds on processing are demonstrated to be achieved without reducing the parsing accuracy, in fact in some cases improving the accuracy. The new methods achieve observed performance comparable to the previously published quadratic complexity method. Finally, we demonstrate improved performance by combining complexity bounding methods with additional high precision constraints.

show abstract

Section: Introductionmentioning

confidence: 99%

Linear complexity context-free parsing pipelines via chart constraints

Roark

Hollingshead

2009

Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com

View full text Add to dashboard Cite

show abstract

“…Using simple retrieval and alignment operations, the model takes advantage of the statistics of word use. Unlike existing work (7,8,10), it need make no a priori commitment to particular grammars, heuristics, or sets of semantic roles, and it does not require an annotated corpus on which to train.…”

Section: Resultsmentioning

confidence: 99%

“…In recent years, there have been a number of attempts to build systems capable of extracting propositional information from sentences (7)(8)(9).…”

mentioning

confidence: 99%

An unsupervised method for the extraction of propositional information from text

Dennis

2004

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

T he closely related fields of question answering and information extraction aim to search large databases of textual material (textbases) to find specific information required by the user (1, 2). As opposed to information retrieval systems, which attempt to identify relevant documents that discuss the topic of the user's information need, information extraction systems return specific information such as names, dates, or amounts that the user requests. Although information retrieval systems (such as Google and Alta Vista) are now in widespread commercial use, information extraction is a much more difficult task and, with some notable exceptions, most current systems are research prototypes. However, the potential significance of reliable information extraction systems is substantial. In military, scientific, and business intelligence gathering, being able to identify specific entities and resources of relevance across documents is crucial. Furthermore, some current information extraction systems now attempt the even more difficult task of providing summaries of relevant information compiled across a document set.The majority of current information extraction systems are based on surface analysis of text applied to very large textbases. Whereas the dominant approaches in the late 1980s and early 1990s would attempt deep linguistic analysis, proposition extraction, and reasoning, most current systems look for answer patterns within the raw text and apply simple heuristics to extract relevant information (3). Such approaches have been shown to work well when information is represented redundantly in the textbase and when the type of the answer is unambiguously specified by the question and tends to be unique within a given sentence or sentence fragment. Although these conditions often hold for general knowledge questions of the kind found in the Text REtrieval Conference (TREC) Question Answer track, there are many intelligence applications for which they cannot be guaranteed. Often relevant information will be stated only once or may only be inferred and never stated explicitly. Furthermore, the results of the most recent TREC question-answer competition suggest that deep reasoning systems may now have reached a level of sophistication that allows them to surpass the performance possible using surface-based approaches. In the 2002 TREC competition, the POWER ANSWER system (4), which converts both questions and answers into propositional form and uses an inference engine, achieved a confidence weighted score of 0.856, a substantive improvement over the second placed exactanswer (5), which received a score of 0.691 in the main questionanswering task.A key component in the performance of the POWER ANSWER system is its use of the WORDNET lexical database (6). WORDNET provides a catalog of simple relationships among words, such as synonymy, hypernymy, and part-of relations that POWER ANSWER uses to supplement its inference system. Despite the relatively small number of relations considered and the difficulties in achi...

show abstract

“…This parser is implemented by transforming the grammar in a binary one, in which every rule is unary or binary. Blaheta and Charniak (1999) achieve a further improvement of the performance, with very little decrease in the accuracy. This improvement is based on the observation that parsers based on FOMs tend to spend too much time in one part of the sentence, finding multiple parses for the same substring, while other parts of the sentence are often ignored in the meantime.…”

Section: Introductionmentioning

confidence: 89%

Stochastic Parsing and Evolutionary Algorithms

Araujo

2009

Applied Artificial Intelligence

View full text Add to dashboard Cite

This article aims to show the effectiveness of evolutionary algorithms in automatically parsing sentences of real texts. Parsing methods based on complete search techniques are limited by the exponential increase of the size of the search space with the size of the grammar and the length of the sentences to be parsed. Approximated methods, such as evolutionary algorithms, can provide approximate results, adequate to deal with the indeterminism that ambiguity introduces in natural language processing. This work investigates different alternatives to implement an evolutionary bottom-up parser. Different genetic operators have been considered and evaluated. We focus on statistical parsing models to establish preferences among different parses. It is not our aim to propose a new statistical model for parsing but a new algorithm to perform the parsing once the model has been defined. The training data are extracted from syntactically annotated corpora (treebanks) which provide sets of lexical and syntactic tags as well as the grammar in which the parsing is based. We have tested the system with two corpora: Susanne and Penn Treebank, obtaining very encouraging results.

show abstract

Automatic compensation for parser figure-of-merit flaws

Cited by 6 publications

References 5 publications

Linear complexity context-free parsing pipelines via chart constraints

Linear complexity context-free parsing pipelines via chart constraints

An unsupervised method for the extraction of propositional information from text

Stochastic Parsing and Evolutionary Algorithms

Contact Info

Product

Resources

About