2005
DOI: 10.1186/1471-2105-6-s1-s2
|View full text |Cite
|
Sign up to set email alerts
|

BioCreAtIvE Task 1A: gene mention finding evaluation

Abstract: Background: The biological research literature is a major repository of knowledge. As the amount of literature increases, it will get harder to find the information of interest on a particular topic. There has been an increasing amount of work on text mining this literature, but comparing this work is hard because of a lack of standards for making comparisons. To address this, we worked with colleagues at the Protein Design Group, CNB-CSIC, Madrid to develop BioCreAtIvE (Critical Assessment for Information Ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
124
0

Year Published

2006
2006
2014
2014

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 148 publications
(124 citation statements)
references
References 13 publications
0
124
0
Order By: Relevance
“…NLPBA (Kim et al, 2004) is a large collection of biomedical abstracts annotated with five entities of interest, such as protein, RNA, and cell-type. BioCreative (Yeh et al, 2005) and FlySlip (Vlachos, 2007) also comprise texts in the biomedical domain, annotated for gene entity mentions in articles from the human and fruit fly literature, respectively. CORA (Peng and McCallum, 2004) consists of two collections: a set of research paper headers annotated for entities such as title, author, and institution; and a collection of references annotated with BibTeX fields such as journal, year, and publisher.…”
Section: Methodsmentioning
confidence: 99%
“…NLPBA (Kim et al, 2004) is a large collection of biomedical abstracts annotated with five entities of interest, such as protein, RNA, and cell-type. BioCreative (Yeh et al, 2005) and FlySlip (Vlachos, 2007) also comprise texts in the biomedical domain, annotated for gene entity mentions in articles from the human and fruit fly literature, respectively. CORA (Peng and McCallum, 2004) consists of two collections: a set of research paper headers annotated for entities such as title, author, and institution; and a collection of references annotated with BibTeX fields such as journal, year, and publisher.…”
Section: Methodsmentioning
confidence: 99%
“…We report experiments performed on real datasets described in Section 2: BioCreative (Yeh et al, 2005, cf. Figure 5), Genia (Tanabe et al, 2005, cf.…”
Section: Methodsmentioning
confidence: 99%
“…We used two well-known corpora from the literature that have frequently been used as benchmark in several papers and challenges: GeneTag from Genia dataset by Tanabe et al (2005) and BioCreative dataset from Yeh et al (2005) (the best F-score for gene/protein name extraction on these corpora are respectively 77.8% and 80%). Furthermore, we consider a very large corpus to fully benefit from scalability of the proposed pattern mining techniques.…”
Section: Motivating Examplementioning
confidence: 99%
“…protein interactions), the automatic classification of texts, and the generation of new hypotheses on the basis of the available literature [3]. The BioCreAtIvE contest [21] nicely shows, that even sophisticated tools for text mining have a considerable lack of precision and recall: For a simple "named entity recognition"-task the precision ranged up to 86% and the recall was at most 84%. Another attempt is described in [4]: Information about protein-interactions was extracted from a data set of 1.2 million sentences that were taken from biomedical abstracts.…”
Section: Motivationmentioning
confidence: 99%