A comparison of extrinsic clustering evaluation metrics based on formal constraints

Amigó, Enrique; Gonzalo, Julio; Artiles, Javier; Verdejo, Felisa

doi:10.1007/s10791-008-9066-8

Cited by 595 publications

(234 citation statements)

References 11 publications

(9 reference statements)

Supporting

Mentioning

200

Contrasting

Unclassified

Order By: Relevance

“…We use argument labels of Hasan and Ng (2014) as target clusters. As noted by Amigó et al (2009), external cluster evaluation is a non-trivial task and there is no consensus on the best approach. We therefore chose to use two established, but rather different measures: the Adjusted Rand Index (ARI) (Hubert and Arabie, 1985) and the information-theoretic Vmeasure (Rosenberg and Hirschberg, 2007).…”

Section: Analysis 1: Clustering Modelsmentioning

confidence: 99%

Proceedings of the 2nd Workshop on Argumentation Mining

Cardie¹,

Green²,

Gurevych³

et al. 2015

View full text Add to dashboard Cite

ii BackgroundThe goal of this workshop is to provide a follow-on forum to last year's very successful Argumentation Mining workshop at ACL, the first research forum devoted to argumentation mining in all domains of discourse.Argumentation mining is a relatively new challenge in corpus-based discourse analysis that involves automatically identifying argumentative structures within a document, e.g., the premises, conclusion, and argumentation scheme of each argument, as well as argument-subargument and argumentcounterargument relationships between pairs of arguments in the document. To date, researchers have investigated methods for argumentation mining of legal documents (Mochales and Moens 2011;Bach et al. 2013;Ashley and Walker 2013;Wyner et al. 2010), on-line debates (Cabrio and Villata 2012), product reviews (Villalba and Saint-Dizier 2012;, user comments on proposed regulations (Park and Cardie 2014), newspaper articles and court cases . A related older strand of research (that uses the term 'argumentative structure' in a related but different sense than ours) has investigated automatically classifying the sentences of a scientific article's abstract or full text in terms of their contribution of new knowledge to a field (e.g., Liakata et al. 2012, Teufel 2010, Mizuta et al. 2005). In addition, argumentation mining has ties to sentiment analysis (e.g., Somasundaran and Wiebe 2010). To date there are few corpora with annotations for argumentation mining research ) although corpora with annotations for argument sub-components have recently become available (e.g., Park and Cardie 2014).Proposed applications of argumentation mining include improving information retrieval and information extraction as well as end-user visualization and summarization of arguments. Textual sources of interest include not only the formal writing of legal text, but also a variety of informal genres such as microtext, spoken meeting transcripts, product reviews and user comments. In instructional contexts where argumentation is a pedagogically important tool for conveying and assessing students' command of course material, the written and diagrammed arguments of students (and the mappings between them) are educational data that can be mined for purposes of assessment and instruction (see e.g., Ong, Litman and Brusilovsky 2014). This is especially important given the wide-spread adoption of computer-supported peer review, computerized essay grading, and large-scale online courses and MOOCs.As one might expect, success in argumentation mining will require interdisciplinary approaches informed by natural language processing technology, theories of semantics, pragmatics and discourse, knowledge of discourse of domains such as law and science, artificial intelligence, argumentation theory, and computational models of argumentation. In addition, it will require the creation and annotation of high-quality corpora of argumentation from different types of sources in different domains.We are looking forward to a full day workshop to exchange i...

show abstract

Section: Analysis 1: Clustering Modelsmentioning

confidence: 99%

Proceedings of the 2nd Workshop on Argumentation Mining

Cardie¹,

Green²,

Gurevych³

et al. 2015

View full text Add to dashboard Cite

show abstract

“…However, there are many ways to evaluate clustering quality. Amigó et al (2009) propose a set of criteria which a clustering evaluation metric should satisfy, and demonstrate that most popular metrics fail to satisfy at least one of these criteria. However, they prove that all criteria are satisfied by the BCubed metric, which we therefore adopt.…”

Section: Detecting Similar Languagesmentioning

confidence: 99%

SeedLing: Building and Using a Seed corpus for the Human Language Project

Emerson

Tan²,

Fertmann³

et al. 2014

Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

View full text Add to dashboard Cite

A broad-coverage corpus such as the Human Language Project envisioned by Abney and Bird (2010) would be a powerful resource for the study of endangered languages. Existing corpora are limited in the range of languages covered, in standardisation, or in machine-readability. In this paper we present SeedLing, a seed corpus for the Human Language Project. We first survey existing efforts to compile cross-linguistic resources, then describe our own approach. To build the foundation text for a Universal Corpus, we crawl and clean texts from several web sources that contain data from a large number of languages, and convert them into a standardised form consistent with the guidelines of Abney and Bird (2011). The resulting corpus is more easily-accessible and machine-readable than any of the underlying data sources, and, with data from 1451 languages covering 105 language families, represents a significant base corpus for researchers to draw on and add to in the future. To demonstrate the utility of SeedLing for cross-lingual computational research, we use our data in the test application of detecting similar languages.

show abstract

“…There exist a number of different metrics for evaluating cluster quality, including Precision and Recall, Normalized Mutual Information, F-score, B-cubed, et cetera [24]. We describe the one we chose below, which met our desire for a single number that summarizes the essentials and allows us to seamlessly compare performance across all algorithms and their parameters.…”

Section: Evaluation Protocolmentioning

confidence: 99%

Large-scale community detection on speaker content graphs

Shum¹,

Campbell

Reynolds

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

We consider the use of community detection algorithms to perform speaker clustering on content graphs built from large audio corpora. We survey the application of agglomerative hierarchical clustering, modularity optimization methods, and spectral clustering as well as two random walk algorithms: Markov clustering and Infomap. Our results on graphs built from the NIST 2005+2006 and 2008+2010 Speaker Recognition Evaluations (SREs) provide insight into both the structure of the speakers present in the data and the intricacies of the clustering methods. In particular, we introduce an additional parameter to Infomap that improves its clustering performance on all graphs. Lastly, we also develop an automatic technique to purify the neighbors of each node by pruning away unnecessary edges.

show abstract

A comparison of extrinsic clustering evaluation metrics based on formal constraints

Cited by 595 publications

References 11 publications

Proceedings of the 2nd Workshop on Argumentation Mining

Proceedings of the 2nd Workshop on Argumentation Mining

SeedLing: Building and Using a Seed corpus for the Human Language Project

Large-scale community detection on speaker content graphs

Contact Info

Product

Resources

About