Yulia Grishina scite author profile

The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including nonreferring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Taskidentity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.

show abstract

Knowledge-lean projection of coreference chains across languages

Grishina

Stede

2015

View full text Add to dashboard Cite

Common technologies for automatic coreference resolution require either a language-specific rule set or large collections of manually annotated data, which is typically limited to newswire texts in major languages. This makes it difficult to develop coreference resolvers for a large number of the so-called low-resourced languages. We apply a direct projection algorithm on a multi-genre and multilingual corpus (English, German, Russian) to automatically produce coreference annotations for two target languages without exploiting any linguistic knowledge of the languages. Our evaluation of the projected annotations shows promising results, and the error analysis reveals structural differences of referring expressions and coreference chains for the three languages, which can now be targeted with more linguistically-informed projection algorithms.

show abstract

Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities

Grishina¹,

Stede²

2017

View full text Add to dashboard Cite

In this paper, we examine the possibility of using annotation projection from multiple sources for automatically obtaining coreference annotations in the target language. We implement a multi-source annotation projection algorithm and apply it on an English-German-Russian parallel corpus in order to transfer coreference chains from two sources to the target side. Operating in two settings -a low-resource and a more linguistically-informed onewe show that automatic coreference transfer could benefit from combining information from multiple languages, and assess the quality of both the extraction and the linking of target coreference mentions.

show abstract

TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style

Zeyrek

Mendes

Grishina

et al. 2019

Lang Resources & Evaluation

View full text Add to dashboard Cite

Toward a bilingual lexical database on connectives: Exploiting a German/Italian parallel corpus

Bourgonje¹,

Grishina²,

Stede³

2017

View full text Add to dashboard Cite

show abstract

Anaphoricity in Connectives: A Case Study on German

Stede

Grishina

2016

View full text Add to dashboard Cite

Anaphoric connectives are event anaphors (or abstract anaphors) that in addition convey a coherence relation holding between the antecedent and the host clause of the connective. Some of them carry an explicitly-anaphoric morpheme, others do not. We analysed the set of German connectives for this property and found that many have an additional nonconnective reading, where they serve as nominal anaphors. Furthermore, many connectives can have multiple senses, so altogether the processing of these words can involve substantial disambiguation. We study the problem for one specific German word, demzufolge, which can be taken as representative for a large group of similar words.

show abstract

Experiments on bridging across languages and genres

Grishina

2016

View full text Add to dashboard Cite

In this paper, we introduce a typology of bridging relations applicable to multiple languages and genres. After discussing our annotation guidelines, we describe annotation experiments on the German part of our parallel coreference corpus and show that our interannotator agreement results are reliable, considering both antecedent selection and relation assignment. In order to validate our theoretical model on other languages, we manually transfer German annotations to the English and Russian sides of the corpus and briefly discuss first results that suggest the promise of our approach. Furthermore, for the complete exploration of extended coreference relations, we exploit an existing near-identity scheme to augment our annotations with near-identity links, and we report on the results.

show abstract

Truecasing German user-generated conversational text

Grishina

Gueudré²,

Winkler³

2020

View full text Add to dashboard Cite

True-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display. In this paper, we investigate truecasing as an intrinsic task and present several experiments on noisy user queries to a voice-controlled dialog system. In particular, we compare a rulebased, an n-gram language model (LM) and a recurrent neural network (RNN) approaches, evaluating the results on a German Q&A corpus and reporting accuracy for different case categories. We show that while RNNs reach higher accuracy especially on large datasets, character n-gram models with interpolation are still competitive, in particular on mixedcase words where their fall-back mechanisms come into play.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yulia Grishina

Anaphora Resolution with the ARRAU Corpus

Knowledge-lean projection of coreference chains across languages

Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities

TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style

Toward a bilingual lexical database on connectives: Exploiting a German/Italian parallel corpus

Anaphoricity in Connectives: A Case Study on German

Experiments on bridging across languages and genres

Truecasing German user-generated conversational text

Contact Info

Product

Resources

About