Parallel implementations of word alignment tool

Gao, Qin; Vogel, Stephan

doi:10.3115/1622110.1622119

Cited by 195 publications

(138 citation statements)

References 8 publications

(9 reference statements)

Supporting

Mentioning

138

Contrasting

Order By: Relevance

“…In fact, several recent articles have reported on reproducibility and/or replication problems in the HLT field (e.g., Johnson et al 2007;Poprat et al 2008;Gao and Vogel 2008;Caporaso et al 2008;Kano et al 2009;Fokkens et al 2013;Hagen et al 2015), and two recent workshops 1 have addressed the need for replication and reproduction of HLT results. However, there is no established venue for publications on the topic, and perhaps more problematically, research that investigates existing methods rather than introducing new ones is often implicitly discouraged in the process of peer review.…”

mentioning

confidence: 99%

Replicability and reproducibility of research results for human language technology: introducing an LRE special section

Branco¹,

Cohen

Vossen

et al. 2017

Lang Resources & Evaluation

View full text Add to dashboard Cite

mentioning

confidence: 99%

Replicability and reproducibility of research results for human language technology: introducing an LRE special section

Branco¹,

Cohen

Vossen

et al. 2017

Lang Resources & Evaluation

View full text Add to dashboard Cite

“…We again compare the benchmark with sets of automatically reordered Chinese sentences generated the same way as in the first scenario. Word alignments between Chinese and Japanese are produced by MGIZA++ [97] in a file named ch-ja.A3.final. In this file, parallel sentence pairs (Chinese and Japanese) are aligned to each other as follows:…”

Section: Discussionmentioning

confidence: 99%

“…The standard Moses 8 [96] baseline was used, where reordered Chinese sentences were paired with their Japanese counterparts and word-to-word alignments were estimated by using MGIZA++ 9 [9,97].…”

Section: Et Ceteramentioning

confidence: 99%

Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine Translation

Han

Martínez-Gómez

Miyao

2016

Hybrid Approaches to Machine Translation

View full text Add to dashboard Cite

Bilingual phrases are the main building blocks in statistical machine translation (SMT) systems. At training time, the most likely word-to-word alignment is computed and several heuristics are used to extract these bilingual phrases. Although this strategy performs relatively well when the source and target languages have a similar word order, the quality of extracted bilingual phrases diminishes when translating between languages structurally different, such as Chinese and Japanese. Syntax-based reordering methods in preprocessing stage have been developed and proved to be useful to aid the extraction of bilingual phrases and decoding. For Chinese-to-Japanese SMT, we carry out a detailed linguistic analysis on word order differences of this language pair to improve the word alignment. Our main contribution is threefold: (1) We first adapt an existing pre-reordering method called Head-finalization (HF) [1] for Chinese (HFC) [2] to improve Chinese-to-Japanese SMT system's translation quality. HF is originally designed to reorder English sentences for English-to-Japanese SMT and it performs well. However, our preliminary experiments results reveal its disadvantages on reordering Chinese due to particular characteristics of languages. We thus refine HF to HFC based on a deep linguistic study. To obtain the required syntactic information, we use a head-driven phrase structure grammar (HPSG) parser for Chinese. Nevertheless, the follow-up error analysis from the pre-reordering experiment explores more issues that bring difficulties for further improvement on HFC, such as the tree operation restriction of binary tree, inconsistency on definition of linguistic term and so on. (2) We then propose an entire new pre-reordering framework which is using an unlabeled dependency parser to achieve additional improvements on reordering Chinese sentences to be like Japanese word orders.We refer to it as DPC [3] for short. In this method, we first identify blocks of Chinese words that demand reorderings, such as verbs and certain particles. Then, we detect the proper position which is the right-hand side of their rightmost object dependent, since our reordering principle is to reorder a Subject-Verb-Object (SVO) language to resemble a Subject-Object-Verb (SOV) language. Other types of particles are relocated in the last step. Unlike other reordering systems, the boundaries of verbal blocks and their rightmost object in DPC are defined only by the dependency tree and part-of-speech tags.Additionally, dismissing of using structural and punctuation border is another benefit for the reordering of the reported speech frequently occurring in news domain. The experiments show advantages of DPC over the SMT baseline (Moses) and our HFC systems.Important advantages of this method are the applicability of many reordering rules to other SVO and SOV language pairs as well as the availability of dependency parsers and POS-taggers for many languages. Considering our pre-reordering methods of HFC and DPC are linguistically-motivated,...

show abstract

“…First, each entry is segmented with the BPE rules available along with the pre-trained Nematus model. Then, the segmented entries are aligned by running MGiza++ (Gao and Vogel, 2008) trained on the BPE-level WMT'16 training data. Finally, all the one-to-one aligned sub-units are extracted to form the sub-word level bilingual term dictionaries.…”

Section: Experimental Settingmentioning

confidence: 99%

Guiding Neural Machine Translation Decoding with External Knowledge

Chatterjee¹,

Negri²,

Turchi³

et al. 2017

Proceedings of the Second Conference on Machine Translation

View full text Add to dashboard Cite

Differently from the phrase-based paradigm, neural machine translation (NMT) operates on word and sentence representations in a continuous space. This makes the decoding process not only more difficult to interpret, but also harder to influence with external knowledge. For the latter problem, effective solutions like the XML-markup used by phrase-based models to inject fixed translation options as constraints at decoding time are not yet available. We propose a "guide" mechanism that enhances an existing NMT decoder with the ability to prioritize and adequately handle translation options presented in the form of XML annotations of source words. Positive results obtained in two different translation tasks indicate the effectiveness of our approach.

show abstract

Parallel implementations of word alignment tool

Cited by 195 publications

References 8 publications

Replicability and reproducibility of research results for human language technology: introducing an LRE special section

Replicability and reproducibility of research results for human language technology: introducing an LRE special section

Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine Translation

Guiding Neural Machine Translation Decoding with External Knowledge

Contact Info

Product

Resources

About