UNRAVEL—A Decipherment Toolkit

Nuhn, Malte; Schamper, Julian; Ney, Hermann

doi:10.3115/v1/p15-2090

Cited by 5 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…UNRAVEL (Nuhn et al, 2015) searches for a mapping of letters that maximize the probability of the decipherment under an n-gram character language model. Partial key mappings are structured into a search tree, and a beam search is used to traverse the tree and find the most promising candidates.…”

Section: Computational Deciphermentmentioning

confidence: 99%

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Hauer¹,

Choi²,

Sundar³

et al. 2021

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

The Dorabella cipher is a symbolic message written in 1897 by English composer Edward Elgar. We analyze the cipher using modern computational and statistical techniques. We consider several open questions: Is the underlying message natural language text or music? If it is language, what is the most likely language? Is Dorabella a simple substitution cipher? If so, why has nobody managed to produce a plausible decipherment? Are some unusual-looking patterns in the cipher likely to occur by chance? Can stateof-the-art algorithmic solvers decipher at least some words of the message? This work is intended as a contribution towards finding answers to these questions.

show abstract

Section: Computational Deciphermentmentioning

confidence: 99%

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Hauer¹,

Choi²,

Sundar³

et al. 2021

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

show abstract

“…Nuhn and colleagues (Nuhn, Schamper, and Ney 2013;Nuhn and Ney 2014;Nuhn, Schamper, and Ney 2015) showed that beam search can significantly improve the speed of EM-based decipherment, while providing comparable or even slightly better accuracy. Beam search prunes less-promising latent states by maintaining two constantsized beams, one for the translation probabilities p( f |e) and one for the target bigram probabilities p(e 1 e 2 )-reducing the computational complexity to O(N F ).…”

Section: Beam Searchmentioning

confidence: 99%

Feature-Based Decipherment for Machine Translation

Naim

Riley

Gildea

2018

Computational Linguistics

View full text Add to dashboard Cite

Orthographic similarities across languages provide a strong signal for probabilistic decipherment, especially for closely related language pairs. The existing decipherment models, however, are not wellsuited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed loglinear model. To address this challenge, we perform approximate inference via MCMC sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence scales to large vocabularies and outperforms the existing generative decipherment models by exploiting the orthographic features.

show abstract

“…For further speedup, we applied perposition pruning with histogram size 50 and the preselection method of Nuhn and Ney (2014) with lexical beam size 5 and LM beam size 50. All our experiments were carried out with the UNRAVEL toolkit (Nuhn et al, 2015). Table 4 summarizes the results.…”

Section: Large Vocabulary Experimentsmentioning

confidence: 99%

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

Kim

Schamper

Ney

2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2

Self Cite

View full text Add to dashboard Cite

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

show abstract

UNRAVEL—A Decipherment Toolkit

Cited by 5 publications

References 7 publications

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Feature-Based Decipherment for Machine Translation

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

Contact Info

Product

Resources

About