Proceedings of the Third Workshop on Discourse in Machine Translation 2017
DOI: 10.18653/v1/w17-4813
|View full text |Cite
|
Sign up to set email alerts
|

Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation

Abstract: The phrase-based Statistical Machine Translation (SMT) approach deals with sentences in isolation, making it difficult to consider discourse context in translation. This poses a challenge for ambiguous words that need discourse knowledge to be correctly translated. We propose a method that benefits from the semantic similarity in lexical chains to improve SMT output by integrating it in a document-level decoder. We focus on word embeddings to deal with the lexical chains, contrary to the traditional approach t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 29 publications
0
8
0
Order By: Relevance
“…Statistical Machine Translation (SMT) Initial studies were based on cache memories (Tiede- mann, 2010; Gong et al, 2011). However, most of the work explicitly models discourse phenomena (Sim Smith, 2017) such as lexical cohesion (Meyer and Popescu-Belis, 2012;Xiong et al, 2013;Loáiciga and Grisot, 2016;Pu et al, 2017;Mascarell, 2017), coherence (Born et al, 2017), and coreference (Rios Gonzales and Tuggener, 2017;Miculicich Werlen and Popescu-Belis, 2017a). Hardmeier et al (2013) introduced the document-level SMT paradigm.…”
Section: Related Workmentioning
confidence: 99%
“…Statistical Machine Translation (SMT) Initial studies were based on cache memories (Tiede- mann, 2010; Gong et al, 2011). However, most of the work explicitly models discourse phenomena (Sim Smith, 2017) such as lexical cohesion (Meyer and Popescu-Belis, 2012;Xiong et al, 2013;Loáiciga and Grisot, 2016;Pu et al, 2017;Mascarell, 2017), coherence (Born et al, 2017), and coreference (Rios Gonzales and Tuggener, 2017;Miculicich Werlen and Popescu-Belis, 2017a). Hardmeier et al (2013) introduced the document-level SMT paradigm.…”
Section: Related Workmentioning
confidence: 99%
“…Conventional Document-level MT These can further be classified into two main categories. The first, which use cache-based memories (Tiedemann, 2010;Gong et al, 2011) and the second, which focus on specific discourse phenomema like anaphora (Hardmeier and Federico, 2010), lexical cohesion (Xiong et al, 2013;Gong et al, 2015;Mascarell, 2017) and coreference (Miculicich Werlen and Popescu-Belis, 2017) to name a few. Most of these approaches are, however, restrictive as they mostly involve using handcrafted features similar to the conventional MT approaches.…”
Section: Related Workmentioning
confidence: 99%
“…These words are related sequentially in the text, defining the topic of the text segment that they cover and establishing associations between sentences. Following this observation, some researchers have obtained success in many NLP tasks such as word sense induction (Tao et al, 2014) , machine translation (Mascarell, 2017) and text (Stokes et al, 2004) segmentation. In the BB-rel dataset, the sentences where inter-sentence relations occur usually express the same topic or have semantic associations each other.…”
Section: Contained Threementioning
confidence: 98%