2010
DOI: 10.1142/s0219720010004562
|View full text |Cite
|
Sign up to set email alerts
|

Calbc Silver Standard Corpus

Abstract: The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
62
0

Year Published

2010
2010
2020
2020

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 88 publications
(62 citation statements)
references
References 8 publications
0
62
0
Order By: Relevance
“…However, due to the huge amount of data in the test set, manual annotation of all of it is infeasible. Therefore, a silver corpus approach, such as the one used in the CALBC challenges 17 [16], will be adopted. This silver corpus is built based on voting on the participant submissions.…”
Section: Manual Annotationmentioning
confidence: 99%
“…However, due to the huge amount of data in the test set, manual annotation of all of it is infeasible. Therefore, a silver corpus approach, such as the one used in the CALBC challenges 17 [16], will be adopted. This silver corpus is built based on voting on the participant submissions.…”
Section: Manual Annotationmentioning
confidence: 99%
“…The interest in semantic annotation of biomedical entities is such that initiatives such as CALBC (Collaborative Annotation of a Large Biomedical Corpus) [15] have been set up with the goal of providing a silver standard corpus (SSC) of annotated biomedical entities to the community. These annotations are the result of an agreement between the participants annotations.…”
Section: Biomedical Semantic Annotation Toolsmentioning
confidence: 99%
“…More specifically, for each pair of semantic types expressed in the UMLS Semantic Network, we have selected relation strings that are included in other gold standards (e.g., protein-protein, drug-drug, protein-disease interactions) along with those that appear frequently in our dataset under the corresponding signature. Afterwards, we have manually clustered synonymous strings into 249 groups 15 . As in [54], we use precision (P), recall (R ) and F-score (F1) to measure the overlap between the best matches between the system-generated clusters and the GS groups.…”
Section: Setupmentioning
confidence: 99%
“…Despite the undeniable presence of wrong annotations and the absence of many others, previous works have demonstrated that these corpora can support development of semisupervised or distant supervised systems for named-entity 13 and relationship extraction 14 . As manual annotation or validation is not required in this case, such corpora tends to be much larger than the gold-standard ones.…”
mentioning
confidence: 99%