Abstract:Explanation-based generalization is used to extract a specialized grammar from the original one using a training corpus of parse trees. This allows very much faster parsing and gives a lower error rate, at the price of a small loss in coverage. Previously, it has been necessary to specify the tree-cutting criteria (or operationality criteria) manually; here they are derived automatically from the training set and the desired coverage of the specialized grammar. This is done by assigning an entropy value to eac… Show more
“…There, the basic idea is to learn special grammar rules from the original ones and a set of training examples by chunking together the former based on how they are used to parse the latter. The relevant references are (Samuelsson & Rayner 1991), (Samuelsson 1994a) and .…”
A method is given that \inverts" a logic grammar and displays it from the point of view of the logical form, rather than from that of the word string. LR-compiling techniques are used to allow a recursivedescent generation algorithm to perform \functor merging" much in the same way as an LR parser performs pre x merging. This is an improvement on the semantic-head-driven generator that results in a much smaller search space. The amount of semantic lookahead can be varied, and appropriate tradeo points between table size and resulting nondeterminism can be found automatically. This can be done by removing all spurious nondeterminism for input su ciently close to the examples of a training corpus, and large portions of it for other input, while preserving completeness. 1 1 I wish to thank greatly Gregor Erbach, Jussi Karlgren, Manny Rayner, Hans Uszkoreit, Mats Wir en and the anonymous reviewers of ACL, EACL, IJCAI and RANLP for valuable feedback on previous versions of this article. Special credit is due to Kristina Striegnitz, who assisted with the implementation.Parts of this article have previously appeared as (Samuelsson 1995). The presented work was funded by the N3 \Bidirektionale Linguistische Deduktion (BiLD)" project in the Sonderforschungsbereich 314 K unstliche Intelligenz | Wissensbasierte Systeme.
“…There, the basic idea is to learn special grammar rules from the original ones and a set of training examples by chunking together the former based on how they are used to parse the latter. The relevant references are (Samuelsson & Rayner 1991), (Samuelsson 1994a) and .…”
A method is given that \inverts" a logic grammar and displays it from the point of view of the logical form, rather than from that of the word string. LR-compiling techniques are used to allow a recursivedescent generation algorithm to perform \functor merging" much in the same way as an LR parser performs pre x merging. This is an improvement on the semantic-head-driven generator that results in a much smaller search space. The amount of semantic lookahead can be varied, and appropriate tradeo points between table size and resulting nondeterminism can be found automatically. This can be done by removing all spurious nondeterminism for input su ciently close to the examples of a training corpus, and large portions of it for other input, while preserving completeness. 1 1 I wish to thank greatly Gregor Erbach, Jussi Karlgren, Manny Rayner, Hans Uszkoreit, Mats Wir en and the anonymous reviewers of ACL, EACL, IJCAI and RANLP for valuable feedback on previous versions of this article. Special credit is due to Kristina Striegnitz, who assisted with the implementation.Parts of this article have previously appeared as (Samuelsson 1995). The presented work was funded by the N3 \Bidirektionale Linguistische Deduktion (BiLD)" project in the Sonderforschungsbereich 314 K unstliche Intelligenz | Wissensbasierte Systeme.
“…Our pruning strategies are extremely simple. The cutting criteria employed in grammar specialization either require carefully manually tuning, or require more complicated statistical techniques (Samuelsson, 1994); automatically derived cutting criteria, however, perform considerably worse.…”
A corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora.
“…Subsequent work (Rayner and Carter, 1996;Samuelsson, 1994) views the problem as that of cutting up each tree in a treebank of correct parse trees into subtrees, after which the rule combinations corresponding to the subtrees determine the rules of the specialized grammar. This approach reports experimental results, using the SRI Core Language Engine, (Alshawi, 1992), in the ATIS domain, of more than a 3-fold speedup at a cost of 5% in grammatical coverage, the latter which is compensated by an increase in parsing accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…This approach reports experimental results, using the SRI Core Language Engine, (Alshawi, 1992), in the ATIS domain, of more than a 3-fold speedup at a cost of 5% in grammatical coverage, the latter which is compensated by an increase in parsing accuracy. Later work (Samuelsson, 1994;Sima'an, 1999) attempts to automatically determine appropriate tree-cutting criteria, the former using local measures, the latter using global ones.…”
Broad-coverage grammars tend to be highly ambiguous. When such grammars are used in a restricted domain, it may be desirable to specialize them, in effect trading some coverage for a reduction in ambiguity. Grammar specialization is here given a novel formulation as an optimization problem, in which the search is guided by a global measure combining coverage, ambiguity and grammar size. The method, applicable to any unification grammar with a phrasestructure backbone, is shown to be effective in specializing a broad-coverage LFG for French.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.