This note describes a scoring scheme for the coreference task in MUC6. It improves o n the original approach l by: (1) grounding the scoring scheme in terms of a model ; (2) producing more intuitive recall and precision scores ; and (3) not requiring explici t computation of the transitive closure of coreference. The principal conceptual differenc e is that we have moved from a syntactic scoring model based on following coreferenc e links to an approach defined by the model theory of those links .
Historically, tailoring language processing systems to specific domains and languages for which they were not originally built has required a great deal of effort. Recent advances in corpus-based manual and automatic training methods have shown promise in reducing the time and cost of this porting process. These developments have focused even greater attention on the bottleneck of acquiring reliable, manually tagged training data. This paper describes a new set of integrated tools, collectively called the Alembic Workbench, that uses a mixed-initiative approach to "bootstrapping" the manual tagging process, with the goal of reducing the overhead associated with corpus development.Initial empirical studies using the Alembic Workbench to annotate "named entities" demonstrates that this approach can approximately double the production rate. As an ~ benefit, the combined efforts of machine and user produce domainspecific annotation rules that can be used to annotate similar texts automatically through the Alembic NLP system. The ultimate goal of this project is to enable end users to generate a practical domain-specific information extraction system within a single session.
We present a novel approach to parsing phrase grammars based on Eric Brill's notion of rule sequences. The basic framework we describe has somewhat less power than a finite-state machine, and yet achieves high accuracy on standard phrase parsing tasks. The rule language is simple, which makes it easy to write rules. Further, this simplicity enables the automatic acquisition of phraseparsing rules through an error-reduction strategy.
This paper is concerned with statistical methods for treating long-distance dependencies. We focus in particular on a case of substantial recent interest: that of long-distance dependency effects in entity extraction. We introduce a new approach to capturing these effects through a simple feature copying preprocess, and demonstrate substantial performance gains on several entity extraction tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.