Lucia Specia scite author profile

Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012)(2013)(2014)(2015)(2016)(2017).

show abstract

Findings of the 2014 Workshop on Statistical Machine Translation

Bojar

et al. 2014

View full text Add to dashboard Cite

This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries.

show abstract

Findings of the 2016 Conference on Machine Translation

Bojar¹,

Chatterjee²,

Federmann³

et al. 2016

394

358

View full text Add to dashboard Cite

Findings of the 2016 Conference on Machine Translation (WMT16) Bojar, O.; Chatterjee, R.; Federmann, C.; Graham, Y.; Haddow, B.; Huck, M.; Jimeno Yepes, A.; Koehn, P.; Logacheva, V.; Monz, C.; Negri, M.; Névéol, A.; Neves, M.; Popel, M.; Post, M.; Rubino, R.; Scarton, C.; Specia, L.; Turchi, M.; Verspoor, K.; Zampieri, M.Abstract This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments).The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.1 http://statmt.org/wmt16/results.html 2

show abstract

Findings of the 2017 Conference on Machine Translation (WMT17)

Bojar¹,

Chatterjee²,

Federmann³

et al. 2017

293

265

View full text Add to dashboard Cite

Multi30K: Multilingual English-German Image Descriptions

Elliott¹,

et al. 2016

View full text Add to dashboard Cite

We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on Englishlanguage datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) German descriptions crowdsourced independently of the original English descriptions. We describe the data and outline how it can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks.

show abstract

Findings of the 2015 Workshop on Statistical Machine Translation

et al. 2015

View full text Add to dashboard Cite

show abstract

SemEval 2016 Task 11: Complex Word Identification

Paetzold¹,

Specia²

2016

134

220

View full text Add to dashboard Cite

We report the findings of the Complex Word Identification task of SemEval 2016. To create a dataset, we conduct a user study with 400 non-native English speakers, and find that complex words tend to be rarer, less ambiguous and shorter. A total of 42 systems were submitted from 21 distinct teams, and nine baselines were provided. The results highlight the effectiveness of Decision Trees and Ensemble methods for the task, but ultimately reveal that word frequencies remain the most reliable predictor of word complexity.

show abstract

Integrating Folksonomies with the Semantic Web

View full text Add to dashboard Cite

Abstract. While tags in collaborative tagging systems serve primarily an indexing purpose, facilitating search and navigation of resources, the use of the same tags by more than one individual can yield a collective classification schema. We present an approach for making explicit the semantics behind the tag space in social tagging systems, so that this collaborative organization can emerge in the form of groups of concepts and partial ontologies. This is achieved by using a combination of shallow pre-processing strategies and statistical techniques together with knowledge provided by ontologies available on the semantic web. Preliminary results on the del.icio.us and Flickr tag sets show that the approach is very promising: it generates clusters with highly related tags corresponding to concepts in ontologies and meaningful relationships among subsets of these tags can be identified.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lucia Specia

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Findings of the 2014 Workshop on Statistical Machine Translation

Findings of the 2016 Conference on Machine Translation

Findings of the 2017 Conference on Machine Translation (WMT17)

Multi30K: Multilingual English-German Image Descriptions

Findings of the 2015 Workshop on Statistical Machine Translation

SemEval 2016 Task 11: Complex Word Identification

Integrating Folksonomies with the Semantic Web

Contact Info

Product

Resources

About