In this paper we present an overview of MultiLing 2015, a special session at SIGdial 2015. MultiLing is a communitydriven initiative that pushes the state-ofthe-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. There were in total 23 participants this year submitting their system outputs to one or more of the four tasks of MultiLing: MSS, MMS, OnForumS and CCCS. We provide a brief overview of each task and its participation and evaluation.
We present a corpus of anaphoric information (coreference) crowdsourced through a gamewith-a-purpose. The corpus, containing annotations for about 108,000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2.2M in total. This characteristic makes the corpus a unique resource for the study of disagreements on anaphoric interpretation. A second distinctive feature is its rich annotation scheme, covering singletons, expletives, and split-antecedent plurals. Finally, the corpus also comes with labels inferred using a recently proposed probabilistic model of annotation for coreference. The labels are of high quality and make it possible to successfully train a state of the art coreference resolver, including training on singletons and non-referring expressions. The annotation model can also result in more than one label, or no label, being proposed for a markable, thus serving as a baseline method for automatically identifying ambiguous markables. A preliminary analysis of the results is presented.
The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) annotator accuracies and biases, and (3) item difficulties and error patterns. Traditionally, majority voting was used for 1, and coefficients of agreement for 2 and 3. Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation.
Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.The current paper describes and extends the resource creation activities and evaluations that underpinned experiments and findings that have previously appeared as an LREC workshop paper (El-Haj et al 2010), a student conference paper (El-Haj et al 2011b), and a description of a multilingual summarisation pilot (El-Haj et al 2011c;.
Improving Web search technology is a hot topic. One aspect that makes it so interesting is the fact that Web documents are typically not plain text files-instead, they contain a tremendous amount of implicit knowledge stored in the markup of the documents.Much of this need not be used in general Web search, because the search engine doesn't need to understand the documents it is accessing. But what if the document collections you want to search are domain-specific or limited in size? This type of data source is everywhere, from corporate intranets to local Web sites. Wouldn't it be useful to have a simple dialogue system that knows what data is available and can assist users in the search process? Furthermore, shouldn't such a system be portable enough to be run on a completely different collection without much hassle?Here, I present such a search system, based on a generic framework that incorporates a simple domain-independent dialogue manager and an automatically created domain model. I constructed the model by exploiting the markup structure in documents and offer two different domains for which users can construct similar models rapidly, applicable without customization. Searching Web documentsLet us start with some motivating investigations concerning users' behavior when searching the Web. A comprehensive study of Web queries evaluated nearly a billion queries submitted to AltaVista in a 43-day period. 1 The study concluded that queries are normally very short-an average user query is only 2.3 words. It also found that the 25 most common queries account for 1.5 percent of all queries, even though they are only a small fraction of all unique queries. In addition, "for 85 percent of the queries, only the first result screen is viewed, and 77 percent of the sessions only contain one query-that is, the queries were not modified in these sessions." 1 We can learn at least two lessons from this work. First of all, because user queries are generally very short, the search engine will generally return numerous documents. Second, the majority of users do not perform any query modifications. A system that applies a domain model to propose possible query refinements must perform extremely well for the user to accept it. Furthermore, researchers have conducted numerous studies to determine whether the search process could benefit from offering potentially relevant terms to the user in an interactive query expansion process. In one study, potential expansion terms are automatically derived from the documents that the original query retrieves. 2 Their underlying assumption reads as follows:It seems reasonable to assume that a searcher, given a list of the query expansion terms, will be able to distinguish the good terms from the bad terms. 2 The study found that when an experienced user performs interactive query expansion, it could significantly improve the search process. However, results also showed that inexperienced users did not make good term selections; therefore, interactive query expansion led to no im...
The goal of the ANAWIKI project is to experiment with Web collaboration and human computation to create largescale linguistically annotated corpora. We will present ongoing work and initial results of Phrase Detectives, a game designed to collect judgments about anaphoric annotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.