2005
DOI: 10.1007/978-3-540-31865-1_25
|View full text |Cite
|
Sign up to set email alerts
|

A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation

Abstract: Abstract.We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F -score, and 2/ comparing the results, in terms of precision, recall and F -score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
718
0
13

Year Published

2006
2006
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,504 publications
(871 citation statements)
references
References 7 publications
(12 reference statements)
3
718
0
13
Order By: Relevance
“…The test set was compared to the reference set using the weighted harmonic mean of precision and recall (F1), a recognised standard test for measuring performance of information retrieval methods (Goutte and Gaussier, 2005). Table 1 summarises the results of the comparison between manual GOA annotation and our predicted annotation and shows precision, recall and F-Score for the three GO ontology categories.…”
Section: Evaluation Of Annotation Methodsmentioning
confidence: 99%
“…The test set was compared to the reference set using the weighted harmonic mean of precision and recall (F1), a recognised standard test for measuring performance of information retrieval methods (Goutte and Gaussier, 2005). Table 1 summarises the results of the comparison between manual GOA annotation and our predicted annotation and shows precision, recall and F-Score for the three GO ontology categories.…”
Section: Evaluation Of Annotation Methodsmentioning
confidence: 99%
“…Clusters linked with no reference tree are classified as false positive (FP). We can evaluate the detection accuracy in terms of "recall" (r = TP/(TP + FN)), which indicates the tree detection rate, and "precision" (p = TP/(TP + FP)), which indicates the correctness of the detected trees [31]. Table 3 shows the accuracy assessments for trees located in different forest storeys within six test subplots with ID of DHS0101, DHS0102, DHS0201, DHS0202, DHS0301 and DHS0302.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…For classification tasks, we used the terms "true positives", "true negatives", "false positives" (type I error), and "false negatives" (type II error) to compare the results of the classifier against the gold standard (Goutte & Gaussier, 2005). The terms "positive" and "negative" refer to the result indicated by the classifier, whereas the terms "true" and "false" refer to whether that result corresponds to the gold standard.…”
Section: Sentiment-driven Feedback In Inter-editor Communicationmentioning
confidence: 99%