Latent-Variable Synchronous CFGs for Hierarchical Translation

Saluja, Avneesh; Dyer, Chris; Cohen, Shay B.

doi:10.3115/v1/d14-1210

Cited by 4 publications

(8 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Our model is completely monolingual and unlexicalized (does not condition its reordering on the translation) in contrast with the Latent SCFG used in (Saluja et al, 2014), • Our Latent PCFG label splits are defined as refinements of prime permutations, i.e., specifically designed for learning reordering, whereas (Saluja et al, 2014) aims at learning label splitting that helps predicting NDTs from source sentences, • Our model exploits all PETs and all derivations, both during training (latent treebank) and during inferences. In (Saluja et al, 2014) only left branching NDT derivations are used for learning the model.…”

Section: Related Workmentioning

confidence: 99%

“…If we are to choose a single PET per training instance, then learning RG from only left-branching PETs (the one usually chosen in other work, e.g. (Saluja et al, 2014)) performs slightly worse than the right-branching PET. This is possibly because English is mostly rightbranching.…”

Section: Extrinsic Evaluation In Mtmentioning

confidence: 99%

“…In (Saluja et al, 2014) only left branching NDT derivations are used for learning the model. • The training data used by (Saluja et al, 2014) is about 60 times smaller in number of words than the data used here; the test set of (Saluja et al, 2014) also consists of far shorter sentences where reordering could be less crucial.…”

Section: Related Workmentioning

confidence: 99%

“…We initialize the nonterminal set of this PCFG to the prime permutations decorating the PET nodes. Subsequently we split these coarse labels in the same way as latent variable splitting is learned for treebank parsing (Matsuzaki et al, 2005;Prescher, 2005;Petrov et al, 2006;Saluja et al, 2014). Unlike treebank parsing, however, our training treebank is latent because it consists of a whole forest of PETs per training instance (s).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Reordering Grammar Induction

Stanojević

Sima’an

2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We present a novel approach for unsupervised induction of a Reordering Grammar using a modified form of permutation trees (Zhang and Gildea, 2007), which we apply to preordering in phrase-based machine translation. Unlike previous approaches, we induce in one step both the hierarchical structure and the transduction function over it from word-aligned parallel corpora. Furthermore, our model (1) handles non-ITG reordering patterns (up to 5-ary branching), (2) is learned from all derivations by treating not only labeling but also bracketing as latent variable, (3) is entirely unlexicalized at the level of reordering rules, and (4) requires no linguistic annotation.Our model is evaluated both for accuracy in predicting target order, and for its impact on translation quality. We report significant performance gains over phrase reordering, and over two known preordering baselines for English-Japanese.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Extrinsic Evaluation In Mtmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reordering Grammar Induction

Stanojević

Sima’an

2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Learning latent-variable SCFGs for hierarchical translation is explored by Saluja et al (2014). This work uses spectral learning or the EM-algorithm to learn tensors that capture the latent variable information of rules.…”

Section: Learning Labelsmentioning

confidence: 99%

Labeling hierarchical phrase-based models without linguistic resources

Wenniger

Sima’an

2015

Machine Translation

View full text Add to dashboard Cite

Long-range word order differences are a well-known problem for machine translation. Unlike the standard phrase-based models which work with sequential and local phrase reordering, the hierarchical phrase-based model (Hiero) embeds the reordering of phrases within pairs of lexicalized context-free rules. This allows the model to handle long range reordering recursively. However, the Hiero grammar works with a single nonterminal label, which means that the rules are combined together into derivations independently and without reference to context outside the rules themselves. Follow-up work explored remedies involving nonterminal labels obtained from monolingual parsers and taggers. As of yet, no labeling mechanisms exist for the many languages for which there are no good quality parsers or taggers. In this paper we contribute a novel approach for acquiring reordering labels for Hiero grammars directly from the word-aligned parallel training corpus, without use of any taggers or parsers. The new labels represent types of alignment patterns in which a phrase pair is embedded within larger phrase pairs. In order to obtain alignment patterns that generalize well, we propose to decompose word alignments into trees over phrase pairs. Beside this labeling approach, we contribute coarse and sparse features for learning soft, weighted label-substitution as opposed to standard substitution. We report extensive experiments comparing our model to two baselines: Hiero and the known syntax augmented machine translation (SAMT) variant, which labels Hiero rules with nonterminals extracted from monolingual syntactic parses. We also test a simplified labeling scheme based on inversion transduction grammar (ITG English task we obtain performance improvement up to 1 BLEU point, whereas for the German-English task, where morphology is an issue, a minor (but statistically significant) improvement of 0.2 BLEU points is reported over SAMT. While ITG labeling does give a performance improvement, it remains sometimes suboptimal relative to our proposed labeling scheme.

show abstract

Latent-Variable PCFGs: Background and Applications

Cohen¹

2017

Proceedings of the 15th Meeting on the Mathematics of Language

View full text Add to dashboard Cite

Latent-variable probabilistic context-free grammars are latent-variable models that are based on context-free grammars. Nonterminals are associated with latent states that provide contextual information during the top-down rewriting process of the grammar. We survey a few of the techniques used to estimate such grammars and to parse text with them. We also give an overview of what the latent states represent for English Penn treebank parsing, and provide an overview of extensions and related models to these grammars.

show abstract

Latent-Variable Synchronous CFGs for Hierarchical Translation

Cited by 4 publications

References 21 publications

Reordering Grammar Induction

Reordering Grammar Induction

Labeling hierarchical phrase-based models without linguistic resources

Latent-Variable PCFGs: Background and Applications

Contact Info

Product

Resources

About