Udo Kruschwitz scite author profile

In this paper we present an overview of MultiLing 2015, a special session at SIGdial 2015. MultiLing is a communitydriven initiative that pushes the state-ofthe-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. There were in total 23 participants this year submitting their system outputs to one or more of the four tasks of MultiLing: MSS, MMS, OnForumS and CCCS. We provide a brief overview of each task and its participation and evaluation.

show abstract

A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

Poesio¹,

Chamberlain²,

Paun³

et al. 2019

View full text Add to dashboard Cite

We present a corpus of anaphoric information (coreference) crowdsourced through a gamewith-a-purpose. The corpus, containing annotations for about 108,000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2.2M in total. This characteristic makes the corpus a unique resource for the study of disagreements on anaphoric interpretation. A second distinctive feature is its rich annotation scheme, covering singletons, expletives, and split-antecedent plurals. Finally, the corpus also comes with labels inferred using a recently proposed probabilistic model of annotation for coreference. The labels are of high quality and make it possible to successfully train a state of the art coreference resolver, including training on singletons and non-referring expressions. The annotation model can also result in more than one label, or no label, being proposed for a markable, thus serving as a baseline method for automatically identifying ambiguous markables. A preliminary analysis of the results is presented.

show abstract

Comparing Bayesian Models of Annotation

Paun

Carpenter

Chamberlain

et al. 2018

TACL

View full text Add to dashboard Cite

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) annotator accuracies and biases, and (3) item difficulties and error patterns. Traditionally, majority voting was used for 1, and coefficients of agreement for 2 and 3. Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation.

show abstract

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

El-Haj

Kruschwitz

Fox

2014

Lang Resources & Evaluation

View full text Add to dashboard Cite

Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.The current paper describes and extends the resource creation activities and evaluations that underpinned experiments and findings that have previously appeared as an LREC workshop paper (El-Haj et al 2010), a student conference paper (El-Haj et al 2011b), and a description of a multilingual summarisation pilot (El-Haj et al 2011c;.

show abstract

Exploring Language Style in Chatbots to Increase Perceived Product Value and User Engagement

Elsholz

Chamberlain

Kruschwitz

2019

View full text Add to dashboard Cite

An adaptable search system for collections of partially structured documents

Kruschwitz

2003

IEEE Intell. Syst.

View full text Add to dashboard Cite

Improving Web search technology is a hot topic. One aspect that makes it so interesting is the fact that Web documents are typically not plain text files-instead, they contain a tremendous amount of implicit knowledge stored in the markup of the documents.Much of this need not be used in general Web search, because the search engine doesn't need to understand the documents it is accessing. But what if the document collections you want to search are domain-specific or limited in size? This type of data source is everywhere, from corporate intranets to local Web sites. Wouldn't it be useful to have a simple dialogue system that knows what data is available and can assist users in the search process? Furthermore, shouldn't such a system be portable enough to be run on a completely different collection without much hassle?Here, I present such a search system, based on a generic framework that incorporates a simple domain-independent dialogue manager and an automatically created domain model. I constructed the model by exploiting the markup structure in documents and offer two different domains for which users can construct similar models rapidly, applicable without customization. Searching Web documentsLet us start with some motivating investigations concerning users' behavior when searching the Web. A comprehensive study of Web queries evaluated nearly a billion queries submitted to AltaVista in a 43-day period. 1 The study concluded that queries are normally very short-an average user query is only 2.3 words. It also found that the 25 most common queries account for 1.5 percent of all queries, even though they are only a small fraction of all unique queries. In addition, "for 85 percent of the queries, only the first result screen is viewed, and 77 percent of the sessions only contain one query-that is, the queries were not modified in these sessions." 1 We can learn at least two lessons from this work. First of all, because user queries are generally very short, the search engine will generally return numerous documents. Second, the majority of users do not perform any query modifications. A system that applies a domain model to propose possible query refinements must perform extremely well for the user to accept it. Furthermore, researchers have conducted numerous studies to determine whether the search process could benefit from offering potentially relevant terms to the user in an interactive query expansion process. In one study, potential expansion terms are automatically derived from the documents that the original query retrieves. 2 Their underlying assumption reads as follows:It seems reasonable to assume that a searcher, given a list of the query expansion terms, will be able to distinguish the good terms from the bad terms. 2 The study found that when an experienced user performs interactive query expansion, it could significantly improve the search process. However, results also showed that inexperienced users did not make good term selections; therefore, interactive query expansion led to no im...

show abstract

Multi-document arabic text summarisation

El-Haj

Kruschwitz

Fox

2011

View full text Add to dashboard Cite

A demonstration of human computation using the Phrase Detectives annotation game

Chamberlain

Poesio

Kruschwitz

2009

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Udo Kruschwitz

MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations

A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

Comparing Bayesian Models of Annotation

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

Exploring Language Style in Chatbots to Increase Perceived Product Value and User Engagement

An adaptable search system for collections of partially structured documents

Multi-document arabic text summarisation

A demonstration of human computation using the Phrase Detectives annotation game

Contact Info

Product

Resources

About