Nicola Cancedda scite author profile

Nicola Cancedda

5Publications

210Citation Statements Received

158Citation Statements Given

How they've been cited

299

209

How they cite others

153

Affiliations

Meta (United Kingdom), Xerox (France), Microsoft (United Kingdom)

Publications

Order By: Most citations

Multilingual Autoregressive Entity Linking

Cao

Popat

et al. 2022

View full text Add to dashboard Cite

We present mGENRE, a sequence-to- sequence system for the Multilingual Entity Linking (MEL) problem—the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50% improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where we establish new state-of-the-art results. Source code available at https://github.com/facebookresearch/GENRE.

show abstract

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick¹,

Dwivedi-Yu²,

Dessì³

et al. 2023

Preprint

View full text Add to dashboard Cite

Translating with non-contiguous phrases

Simard

Cancedda

Cavestro

et al. 2005

View full text Add to dashboard Cite

This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.

show abstract

Generation of Compound Words in Statistical Machine Translation into Compounding Languages

Stymne

Cancedda

Ahrenberg

2013

Computational Linguistics

View full text Add to dashboard Cite

In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds

show abstract

Confusion Matrix

Shultz¹,

Fahlman²,

Craw³

et al. 2011

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nicola Cancedda

Multilingual Autoregressive Entity Linking

Toolformer: Language Models Can Teach Themselves to Use Tools

Translating with non-contiguous phrases

Generation of Compound Words in Statistical Machine Translation into Compounding Languages

Confusion Matrix

Contact Info

Product

Resources

About