On coreference resolution performance metrics

Luo, Xiaoqiang

doi:10.3115/1220575.1220579

Cited by 326 publications

(300 citation statements)

References 8 publications

Supporting

Mentioning

295

Contrasting

Unclassified

Order By: Relevance

“…In the first one automatically detected mentions are provided to the models and in the second one the mentions are gold. 4 The metrics used in our evaluations are MUC (Vilain et al, 1995), B 3 (Bagga and Baldwin, 1998), CEAF e (Luo, 2005), CEAF m (Luo, 2005), and BLANC (Recasens and Hovy, 2011). The scores have been calculated using the reference implementation of the CoNLL scorer (Pradhan et al, 2014).…”

Section: Resultsmentioning

confidence: 99%

Coreference Resolution for the Basque Language with BART

Soraluze

Arregi

et al. 2016

Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

View full text Add to dashboard Cite

In this paper we present our work on Coreference Resolution in Basque, a unique language which poses interesting challenges for the problem of coreference. We explain how we extend the coreference resolution toolkit, BART, in order to enable it to process Basque. Then we run four different experiments showing both a significant improvement by extending a baseline feature set and the effect of calculating performance of hand-parsed mentions vs. automatically parsed mentions. Finally, we discuss some key characteristics of Basque which make it particularly challenging for coreference and draw a road map for future work.

show abstract

Section: Resultsmentioning

confidence: 99%

Coreference Resolution for the Basque Language with BART

Soraluze

Arregi

et al. 2016

Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

View full text Add to dashboard Cite

show abstract

“…But intrinsic human judgements are simply not consistent and reliable enough to provide an objective meta-evaluation tool. 23 Moreover, all they provide is an insight into what humans (think they) like, not what is best or most useful for them (the two can be two very different matters, as discussed in [4]). …”

Section: Evaluation Methodsmentioning

confidence: 99%

“…There does not appear to be a single standard evaluation metric in the coreference resolution community. We opted to use the following three: muc-6 [38], ceaf [23], and b-cubed [1], which seem to be the most widely accepted metrics. All three metrics compute Recall, Precision and F-Scores on aligned gold-standard and resolver-tool coreference chains.…”

Section: Automatic Extrinsic Evaluation Of Claritymentioning

confidence: 99%

Generating Referring Expressions in Context: The GREC Task Evaluation Challenges

Belz

Kow

Viethen

et al. 2010

Empirical Methods in Natural Language Generation

View full text Add to dashboard Cite

Abstract. Until recently, referring expression generation (reg) research focused on the task of selecting the semantic content of definite mentions of listener-familiar discourse entities. In the grec research programme we have been interested in a version of the reg problem definition that is (i) grounded within discourse context, (ii) embedded within an application context, and (iii) informed by naturally occurring data. This paper provides an overview of our aims and motivations in this research programme, the data resources we have built, and the first three sharedtask challenges, grec-msr'08, grec-msr'09 and grec-neg'09, we have run based on the data.

show abstract

“…We re-use features that are commonly used for mention pair classification (see e.g., [23], [4]), including grammatical type and subtypes, string and substring matches, apposition and copula, distance (number of separating mentions/sentences/words), gender and number match, synonymy/hypernym and animacy (based on WordNet), family name (based on closed lists), named entity types, syntactic features and anaphoricity detection. Evaluation metrics The systems' outputs are evaluated using the three standard coreference resolution metrics: MUC [29], B 3 [2], and Entity-based CEAF (or CEAF e ) [20]. Following the convention used in CoNLL-2012, we report a global F1-score (henceforth, CoNLL score), which corresponds to an unweighted average of the MUC, B 3 and CEAF e F1 scores.…”

Section: Noun Phrase Coreference Resolutionmentioning

confidence: 99%

Fast Gaussian Pairwise Constrained Spectral Clustering

Chatel

Pelletier

Tommasi

2014

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. We consider the problem of spectral clustering with partial supervision in the form of must-link and cannot-link constraints. Such pairwise constraints are common in problems like coreference resolution in natural language processing. The approach developed in this paper is to learn a new representation space for the data together with a distance in this new space. The representation space is obtained through a constraint-driven linear transformation of a spectral embedding of the data. Constraints are expressed with a Gaussian function that locally reweights the similarities in the projected space. A global, non-convex optimization objective is then derived and the model is learned via gradient descent techniques. Our algorithm is evaluated on standard datasets and compared with state of the art algorithms, like [14,18,31]. Results on these datasets, as well on the CoNLL-2012 coreference resolution shared task dataset, show that our algorithm significantly outperforms related approaches and is also much more scalable.

show abstract

On coreference resolution performance metrics

Cited by 326 publications

References 8 publications

Coreference Resolution for the Basque Language with BART

Coreference Resolution for the Basque Language with BART

Generating Referring Expressions in Context: The GREC Task Evaluation Challenges

Fast Gaussian Pairwise Constrained Spectral Clustering

Contact Info

Product

Resources

About