Handbook of Linguistic Annotation 2017
DOI: 10.1007/978-94-024-0881-2_53
|View full text |Cite
|
Sign up to set email alerts
|

The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation in the Biomedical Domain

Abstract: A major question in linguistics is whether theoretical accounts of the general language work for specific domains. Similarly, in natural language processing, it is clear that general-domain solutions often fail when applied to specialized domains. One such specialized domain, which is increasingly seen as crucial to understanding human biology and disease, is the biomedical domain. For this reason, biomedical corpus construction has been an area of considerable activity in recent years-for example, just in the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 30 publications
(18 citation statements)
references
References 60 publications
(21 reference statements)
0
16
0
2
Order By: Relevance
“…The CRAFT corpus (Bada et al, 2012;Cohen et al, 2017) is a collection of 97 full-text articles, of which 30 have been released only in the course of the present shared task. The documents were manually annotated with respect to 10 different entity types, linked to 8 manually curated ontologies of biomedical terminology: In addition, the annotations are distributed in an extended variant, i. e. CHEBI EXT, CL EXT etc., resulting in a total of 20 annotation sets.…”
Section: Datamentioning
confidence: 99%
“…The CRAFT corpus (Bada et al, 2012;Cohen et al, 2017) is a collection of 97 full-text articles, of which 30 have been released only in the course of the present shared task. The documents were manually annotated with respect to 10 different entity types, linked to 8 manually curated ontologies of biomedical terminology: In addition, the annotations are distributed in an extended variant, i. e. CHEBI EXT, CL EXT etc., resulting in a total of 20 annotation sets.…”
Section: Datamentioning
confidence: 99%
“…The contents of the CRAFT corpus have been described extensively elsewhere [ 77 – 80 ]. We focus here on descriptive statistics that are specifically relevant to the coreference annotation.…”
Section: Methodsmentioning
confidence: 99%
“…20 Our concept normalization system is ConceptMapper, a high-performance customizable dictionary look-up tool implemented as a UIMA component. 23 Funk et al determined that ConceptMapper is the best performing (highest F 1 measure) concept recognition software as compared to others.…”
Section: Methodsmentioning
confidence: 99%
“…Here, we use the Colorado Richly Annotated Full Text Corpus (CRAFT) of full text biomedical journal articles, annotated with concepts from eight different ontologies. 20 As a baseline concept normalization system, we used the best performing systems from Funk, et al, 21 with the precision maximizing parameters for each ontology. For each ontology, we tested for a Zipfian distribution, identified the most common concept errors in PubMed Central Open Access, and tested a set of five different potential pre- and post-processing steps that could improve precision.…”
Section: Introductionmentioning
confidence: 99%