Alignment by agreement

Liang, Percy; Taskar, Ben; Klein, Dan

doi:10.3115/1220835.1220849

Cited by 240 publications

(247 citation statements)

References 12 publications

Supporting

Mentioning

242

Contrasting

Unclassified

Order By: Relevance

“…The idea of bidirectional translation is also not unique to CLIR. Machine translation researchers leverage a comparable idea ("alignment by agreement"), which is now available as a replacement for GIZA++ in the Berkeley Aligner (Liang et al, 2006)). Comparison of our implementations of IMM and DAMM with variants based on Berkeley alignment results would be a logical first next step towards understanding the potential of these alignments in CLIR applications…”

Section: Resultsmentioning

confidence: 99%

Matching meaning for cross-language information retrieval

Wang

Oard

2012

Information Processing & Management

View full text Add to dashboard Cite

This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.

show abstract

Section: Resultsmentioning

confidence: 99%

Matching meaning for cross-language information retrieval

Wang

Oard

2012

Information Processing & Management

View full text Add to dashboard Cite

show abstract

“…For building our AP E B2 system, we set a maximum phrase length of 7 for the translation model, and a 5-gram language model was trained using KenLM (Heafield, 2011). Word alignments between the mt and pe (4.5M synthetic mt-pe data + 12K WMT APE data) were established using the Berkeley Aligner (Liang et al, 2006), while word pairs from hybrid prior alignment (Section 2.1) between mt-pe (12K data) were used for the additional training data to build AP E B2 . The reordering model was trained with the hierarchical, monotone, swap, left to right bidirectional (hier-mslr-bidirectional) method (Galley and Manning, 2008) and conditioned on both the source and target language.…”

Section: Experiments and Resultsmentioning

confidence: 99%

“…The monolingual mt-pe parallel corpus is first word aligned using a hybrid word alignment method based on the alignment combination of three different statistical word alignment methods: (i) GIZA++ (Och, 2003) word alignment with grow-diag-final-and (GDFA) heuristic (Koehn, 2010), (ii) Berkeley word alignment (Liang et al, 2006), and (iii) SymGiza++ (Junczys-Dowmunt and Szał, 2012) word alignment, as well as two different edit distance based word aligners based on Translation Edit Rate (TER) (Snover et al, 2006) and METEOR (Lavie and Agarwal, 2007). We follow the alignment strategy described in (Pal et al, 2013;Pal et al, 2016a).…”

Section: Hybrid Prior Alignmentmentioning

confidence: 99%

Neural Automatic Post-Editing Using Prior Alignment and Reranking

Pal¹,

Naskar²,

Vela³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2

View full text Add to dashboard Cite

We present a second-stage machine translation (MT) system based on a neural machine translation (NMT) approach to automatic post-editing (APE) that improves the translation quality provided by a firststage MT system. Our APE system (AP E Sym ) is an extended version of an attention based NMT model with bilingual symmetry employing bidirectional models, mt → pe and pe → mt. APE translations produced by our system show statistically significant improvements over the first-stage MT, phrase-based APE and the best reported score on the WMT 2016 APE dataset by a previous neural APE system. Re-ranking (AP E Rerank ) of the n-best translations from the phrase-based APE and AP E Sym systems provides further substantial improvements over the symmetric neural APE model. Human evaluation confirms that the AP E Rerank generated PE translations improve on the previous best neural APE system at WMT 2016.

show abstract

“…Alternative approaches could be considered for some of the steps of our rule learning procedure in order to further improve the results obtained. The word alignment quality could be improved by integrating symmetrisation in the training of the alignment models as shown by Liang et al (2006), who have reported a reduction in the alignment error rate with small parallel corpora. Regarding the optimisation performed to discard rules that cause a deficient chunking of the sentences to be translated, some changes could be made to the evaluation metric used to compute the set of key text segments I; for instance, Nakov et al (2012) suggest some improvements to the BLEU smoothing, which are well-suited to sentence-level optimisation.…”

Section: Discussionmentioning

confidence: 99%

A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

Sánchez-Cartagena¹,

Pérez-Ortiz

Sánchez-Martínez

2015

Computer Speech & Language

View full text Add to dashboard Cite

Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallowtransfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules.Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspired by the work of Sánchez-Martínez and Forcada published in 2009 (Journal of Artificial Intelligence Research 34) in which alignment templates were also considered for shallow-transfer rule inference. However, our approach overcomes many relevant limitations of that work, principally those related to the inability to find the correct generalisation level for the alignment templates, and to select the subset of alignment templates that ensures an adequate segmentation of the input sentences by the rules eventually obtained. Unlike previous approaches in literature, our formalism does not require linguistic knowledge about the languages involved in the translation. Moreover, it is the first time that conflicts between rules are resolved by choosing the most appropriate ones according to a global minimisation function rather than proceeding in a pairwise greedy fashion.Experiments conducted using five different language pairs with the free/open-source rule-based MT platform Apertium show that translation quality significantly improves when compared to the method proposed by Sánchez-Martínez and Forcada (2009), and is close to that obtained using handcrafted rules. For some language pairs, our approach is even able to outperform them. Moreover, the resulting number of rules is considerably smaller, which eases human revision and maintenance.

show abstract

Alignment by agreement

Cited by 240 publications

References 12 publications

Matching meaning for cross-language information retrieval

Matching meaning for cross-language information retrieval

Neural Automatic Post-Editing Using Prior Alignment and Reranking

A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

Contact Info

Product

Resources

About