We propose an end-to-end coreference resolution system obtained by adapting neural models that have recently improved the state-of-the-art on the OntoNotes benchmark to make them applicable to other paradigms for this task. We report the performances of our system on ANCOR, a corpus of transcribed oral French-for which it constitutes a new baseline with proper evaluation.
During the last couple of years, Recurrent Neural Networks (RNN) have reached state-of-the-art performances on most of the sequence modelling problems. In particular, the sequence to sequence model and the neural CRF have proved to be very effective in this domain. In this article, we propose a new RNN architecture for sequence labelling, leveraging gated recurrent layers to take arbitrarily long contexts into account, and using two decoders operating forward and backward. We compare several variants of the proposed solution and their performances to the state-ofthe-art. Most of our results are better than the state-of-the-art or very close to it and thanks to the use of recent technologies, our architecture can scale on corpora larger than those used in this work.
We propose a method for investigating the interpretability of metrics used for the coreference resolution task through comparisons with human judgments. We provide a corpus with annotations of different error types and human evaluations of their gravity. Our preliminary analysis shows that metrics considerably overlook several error types and overlook errors in general in comparison to humans. This study is conducted on French texts, but the methodology should be language-independent.
La détection automatique de chaînes de coréférences pour le français est encore un domaine assez peu exploré, entre autres en raison du développement tardif de ressources annotées adaptées. Le corpus Democrat, premier corpus de français écrit de grande envergure annoté en chaînes de coréférences rend possible l'utilisation de techniques d'apprentissage artificiel pour combler ce manque. Dans ce travail, nous présentons le système DeCOFre, premier système de détection des chaînes de coréférences pour le français parlé et étudions son utilisation pour le traitement de Democrat. Nos expériences montrent que ce système n'est pas robuste au changement induits par le passage de l'oral spontané à l'écrit et suggère que les particularités de Democrat pourraient être mieux prises en compte par des architectures plus riches que celles des systèmes end-to-end omniprésentes dans l'état de l'art récent.
Mots-clésApprentissage artificiel, réseaux de neurones artificiels, détection automatique des chaînes de coréférences, français
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.