2008
DOI: 10.1186/gb-2008-9-s2-s10
|View full text |Cite
|
Sign up to set email alerts
|

Automating curation using a natural language processing pipeline

Abstract: Background: The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literatu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
11
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 25 publications
0
11
0
Order By: Relevance
“…Team 6 (Alex and coworkers [ 17 ]) achieved the highest AUC (0.8554), with a precision of 0.7080, a recall of 0.8609, and an F score of 0.7770. They applied a SVM classifier together with careful pre-processing, stemming, part-of-speech (POS) tagging, sentence splitting and shallow parsing.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Team 6 (Alex and coworkers [ 17 ]) achieved the highest AUC (0.8554), with a precision of 0.7080, a recall of 0.8609, and an F score of 0.7770. They applied a SVM classifier together with careful pre-processing, stemming, part-of-speech (POS) tagging, sentence splitting and shallow parsing.…”
Section: Resultsmentioning
confidence: 99%
“…As a general trend, the performance of the systems on the SwissProt-only article set was slightly higher, both in terms of recall (in case of 40 runs) as well as precision (in case of 38 runs). Looking at the performance of individual systems, the run submitted by team 4 [ 21 ] obtained the highest average precision of 0.39, followed by team 28 [ 22 ] with 0.31 and team 6 [ 17 ] reaching a 0.28.…”
Section: Resultsmentioning
confidence: 99%
“…The tagger employes contextual, shallow grammatical, and morphological features tailored to the biomedical domain, as well as a gazetteer of protein names derived from RefSeq. For gene normalization, each of these gene mentions is mapped to a set of possible UniProt identifiers selected from the lexicon using a modified version of the Jaro-Winkler string similarity function [ 34 ]. To choose the most likely identifier from the set, a ML-based disambiguator (trained on BioCreative data) and a species tagger (trained on in-house data) are employed.…”
Section: Resultsmentioning
confidence: 99%
“…This technique was borrowed from the name matching tasks [27] and was also adopted by some teams [28,29] participating in the previous BioCreative gene normalisation tasks. The intuition was that if a text chunk looks similar to an PSI-MI name (e.g., “pull down” vs. “pull-down”), they are likely to refer to the same concept.…”
Section: Methodsmentioning
confidence: 99%