We present a practical HPSG parser for English, an intelligent search engine to retrieve MEDLINE abstracts that represent biomedical events and an efficient MED-LINE search tool helping users to find information about biomedical entities such as genes, proteins, and the interactions between them.
This paper presents a method of automatically constructing information extraction patterns on predicate-argument structures (PASs) obtained by full parsing from a smaller training corpus. Because PASs represent generalized structures for syntactical variants, patterns on PASs are expected to be more generalized than those on surface words. In addition, patterns are divided into components to improve recall and we introduce a Support Vector Machine to learn a prediction model using pattern matching results. In this paper, we present experimental results and analyze them on how well protein-protein interactions were extracted from MEDLINE abstracts. The results demonstrated that our method improved accuracy compared to a machine learning approach using surface word/part-of-speech patterns.
For biomedical information extraction, most systems use syntactic patterns on verbs (anchor verbs) and their arguments. Anchor verbs can be selected by focusing on their arguments. We propose to use predicate-argument structures (PASs), which are outputs of a full parser, to obtain verbs and their arguments. In this paper, we evaluated PAS method by comparing it to a method using part of speech (POSs) pattern matching. POS patterns produced larger results with incorrect arguments, and the results will cause adverse effects on a phase selecting appropriate verbs.
We have developed willex, a tool that helps grammar developers to work efficiently by using annotated corpora and recording parsing errors. Willex has two major new functions. First, it decreases ambiguity of the parsing results by comparing them to an annotated corpus and removing wrong partial results both automatically and manually. Second, willex accumulates parsing errors as data for the developers to clarify the defects of the grammar statistically. We applied willex to a large-scale HPSG-style grammar as an example.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.