A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT

Zollmann, Andreas; Venugopal, Ashish; Och, Franz Josef; Ponte, Jay

doi:10.3115/1599081.1599225

Cited by 29 publications

(28 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A significant improvement in English-Danish machine translation has been achieved [8]. The phrase level representation of source language, converting the source language phrases into target language phrases and reordering the target language phrases generate fluent target output [9]. Phrase based representation and translation develops better Chinese-English, Arabic-English and Urdu-English machine translation systems [9].…”

Section: Knowledge Representation Of Urdu Text and Predicate Logicmentioning

confidence: 99%

Knowledge representation of Urdu text using predicate logic

Ali

Khan

2010

2010 6th International Conference on Emerging Technologies (ICET)

View full text Add to dashboard Cite

Knowledge representation is a key area of research in artificial intelligence which deals with the proper storage and retrieval of knowledge for various useful applications. This research paper proves that knowledge can be easily and efficiently represented in predicate logic. The algorithm in this paper splits the Urdu text/sentences into phrases/constituents and then represents these in predicate logic. This algorithm also generates the original sentences from the representation in order to check the accuracy of representation. The algorithm has been tested on real text/sentences of Urdu. The algorithm has achieved an accuracy of 88%. As the algorithms works on pre-tagged input file, so if the tagging is done correctly then the algorithm achieves high level of accuracy. Therefore it is required that there should be proper rules by the help of which one can correctly tag the input text into phrases/ constituents. The algorithm accurately represents such text in predicate logic. The algorithm also accurately retrieves the original text/sentences from such representation.

show abstract

Section: Knowledge Representation Of Urdu Text and Predicate Logicmentioning

confidence: 99%

Knowledge representation of Urdu text using predicate logic

Ali

Khan

2010

2010 6th International Conference on Emerging Technologies (ICET)

View full text Add to dashboard Cite

show abstract

“…These translation models comprise translationally equivalent sequences of words-so-called phrase pairsthat are extracted from aligned sentence pairs using heuristics over a statistical word alignment. While phrase-based models have achieved state-of-the-art translation quality, evidence suggests there is a limit as to what can be accomplished using only simple phrases, for example, satisfactory capturing of context-sensitive reordering phenomena between language pairs (Zollmann et al 2008). This assertion has been acknowledged within the field as illustrated by the recent shift in focus towards more linguistically motivated models.…”

Section: Introductionmentioning

confidence: 96%

Automatically generated parallel treebanks and their exploitability in machine translation

Tinsley

Way

2009

Machine Translation

View full text Add to dashboard Cite

Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for improvements to the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically-motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PB-SMT) system leads to significant improvements in translation quality. Following this, we describe experiments in which we exploit the information encoded in the parallel treebank in other areas of the PB-SMT framework, while investigating the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the possibility of exploiting automatically-generated parallel treebanks further in syntax-aware paradigms of MT.

show abstract

“…If the rule set can be reduced without reducing translation quality, both memory efficiency and translation speed can be increased. Previously published approaches to reducing the rule set include: enforcing a minimum span of two words per non-terminal (Lopez, 2008), which would reduce our set to 115M rules; or a minimum count (mincount) threshold (Zollmann et al, 2008), which would reduce our set to 78M (mincount=2) or 57M (mincount=3) rules. Shen et al (2008) describe the result of filtering rules by insisting that target-side rules are well-formed dependency trees.…”

Section: Rule Filtering By Patternmentioning

confidence: 99%

“…As another reference point, Chiang (2007) reports Chinese-to-English translation experiments based on 5.5M rules. Zollmann et al (2008) report that filtering rules en masse leads to degradation in translation performance. Rather than apply a coarse filtering, such as a mincount for all rules, we follow a more syntactic approach and further classify our rules according to their pattern and apply different filters to each pattern depending on its value in translation.…”

Section: Rule Filtering By Patternmentioning

confidence: 99%

See 1 more Smart Citation

Rule filtering by pattern for efficient hierarchical translation

Iglesias

Gispert

Banga

et al. 2009

Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL '09

View full text Add to dashboard Cite

We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-toEnglish evaluation task.

show abstract

A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT

Cited by 29 publications

References 14 publications

Knowledge representation of Urdu text using predicate logic

Knowledge representation of Urdu text using predicate logic

Automatically generated parallel treebanks and their exploitability in machine translation

Rule filtering by pattern for efficient hierarchical translation

Contact Info

Product

Resources

About