2012
DOI: 10.1016/j.jbi.2011.12.010
|View full text |Cite
|
Sign up to set email alerts
|

A method for determining the number of documents needed for a gold standard corpus

Abstract: The unstructured narratives in medicine have been increasingly targeted for content extraction using the techniques of natural language processing (NLP). In most cases, these efforts are facilitated by creating a manually annotated set of narratives containing the ground truth; commonly referred to as a gold standard corpus. This corpus is used for modeling, fine-tuning, and testing NLP software as well as providing the basis for training in machine learning. Determining the number of annotated documents (size… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…The sparse literature suggests no standard rules for determining sizes of gold standard and training sets. One method of determining the size of gold standard/training corpus is by Juckett et al ., 2012; however, that paper also mentions how most studies decide on a gold standard or training size purely by ad hoc reasoning depending on the data, financial, time or personnel constraints 42 .…”
Section: Methodsmentioning
confidence: 99%
“…The sparse literature suggests no standard rules for determining sizes of gold standard and training sets. One method of determining the size of gold standard/training corpus is by Juckett et al ., 2012; however, that paper also mentions how most studies decide on a gold standard or training size purely by ad hoc reasoning depending on the data, financial, time or personnel constraints 42 .…”
Section: Methodsmentioning
confidence: 99%
“…Finally, a corpus’ size should be dependent on the questions that it is aimed to answer and the type of tasks where it can be applied [12, 13]. However, in practice it is largely restrained according to available resources (time, money, and people).…”
Section: Methodsmentioning
confidence: 99%
“…Finally, the corpus’ size should be dependent on the questions that it is aimed to answer and the type of tasks where it would be applied ([PCH07], [Juc12]). However, in practice it is largely restrained according to available resources (time, money and people).…”
Section: Methodsmentioning
confidence: 99%