Fabienne Cap scite author profile

Fabienne Cap

5Publications

158Citation Statements Received

71Citation Statements Given

How they've been cited

130

156

How they cite others

Affiliations

University of Stuttgart, Uppsala University

Publications

Order By: Most citations

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions

Savary¹,

Ramisch²,

Cordeiro³

et al. 2017

View full text Add to dashboard Cite

Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as "words with spaces". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-millionword annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.

show abstract

Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation

Weller

Cap

Müller

et al. 2014

View full text Add to dashboard Cite

The paper presents an approach to morphological compound splitting that takes the degree of compositionality into account. We apply our approach to German noun compounds and particle verbs within a German-English SMT system, and study the effect of only splitting compositional compounds as opposed to an aggressive splitting. A qualitative study explores the translational behaviour of non-compositional compounds.

show abstract

How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT

Cap¹,

Fraser²,

Weller³

et al. 2014

View full text Add to dashboard Cite

Compounding in morphologically rich languages is a highly productive process which often causes SMT approaches to fail because of unseen words. We present an approach for translation into a compounding language that splits compounds into simple words for training and, due to an underspecified representation, allows for free merging of simple words into compounds after translation. In contrast to previous approaches, we use features projected from the source language to predict compound mergings. We integrate our approach into end-to-end SMT and show that many compounds matching the reference translation are produced which did not appear in the training data. Additional manual evaluations support the usefulness of generalizing compound formation in SMT.

show abstract

How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation

Cap

Nirmal

Weller

et al. 2015

View full text Add to dashboard Cite

Support-verb constructions (i.e., multiword expressions combining a semantically light verb with a predicative noun) are problematic for standard statistical machine translation systems, because SMT systems cannot distinguish between literal and idiomatic uses of the verb. We work on the German to English translation direction, for which the identification of support-verb constructions is challenging due to the relatively free word order of German. We show that we achieve improved translation quality for verb-object supportverb constructions by marking the verbs when occuring in such constructions. Additional evaluations revealed that our systems produce more correct verb translations than a contrastive baseline system without verb markup.

show abstract

Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools

Tiedemann¹,

Cap

Kanerva

et al. 2016

View full text Add to dashboard Cite

This paper summarises the contributions of the teams at the University of Helsinki, Uppsala University and the University of Turku to the news translation tasks for translating from and to Finnish. Our models address the problem of treating morphology and data coverage in various ways. We introduce a new efficient tool for word alignment and discuss factorisations, gappy language models and reinflection techniques for generating proper Finnish output. The results demonstrate once again that training data is the most effective way to increase translation performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fabienne Cap

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions

Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation

How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT

How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation

Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools

Contact Info

Product

Resources

About