2011
DOI: 10.2478/v10108-011-0010-5
|View full text |Cite
|
Sign up to set email alerts
|

Ncode: an Open Source Bilingual N-gram SMT Toolkit

Abstract: This paper describes N, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review the main features of the toolkit and explain how to build a translation engine with N. We also report a short com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…Since the translation step is monotonic, the peculiarity of this approach relies on the use of a n-gram translation model that estimates the probability of a sequence of bilingual units. Along with the n-gram translation model and a target n-gram language model, 13 conventional features are combined in Equation 7: 4 lexicon models similar to the ones used in standard phrasebased systems; 6 lexicalized reordering models [37,15] aimed at predicting the orientation of the next translation unit; a "weak" distance-based distortion model ; and finally a wordbonus model and a tuple-bonus model which compensate for the system preference for short translations.…”
Section: Manual Transcripts Translationmentioning
confidence: 99%
“…Since the translation step is monotonic, the peculiarity of this approach relies on the use of a n-gram translation model that estimates the probability of a sequence of bilingual units. Along with the n-gram translation model and a target n-gram language model, 13 conventional features are combined in Equation 7: 4 lexicon models similar to the ones used in standard phrasebased systems; 6 lexicalized reordering models [37,15] aimed at predicting the orientation of the next translation unit; a "weak" distance-based distortion model ; and finally a wordbonus model and a tuple-bonus model which compensate for the system preference for short translations.…”
Section: Manual Transcripts Translationmentioning
confidence: 99%
“…4.2.1 Baseline Systems. We compared our system with (i) Moses 9 (Koehn et al 2007), (ii) Phrasal 10 (Cer et al 2010), and (iii) Ncode 11 (Crego, Yvon, and Mariño 2011). We used all these toolkits with their default settings.…”
Section: Initial Evaluationmentioning
confidence: 99%
“…The second one retrieves the scores for all words in the vocabulary associated to state, which is very useful to compute LM look-ahead scores (LMLA was described in detail in Section 10.6.7): Moses framework [Koehn et al 2007] also defines language model classes which seem to provide, at the same time, both n-gram and finite state automata methods. Worth mentioning is OpenFst library [Allauzen et al 2007] which is being used by several HTR, ASR and SMT decoders such as, respectively, OCRopus [Breuel 2008], Kaldi [Povey et al 2011] or Ncode [Crego et al 2011], among others.…”
Section: Automaton Interfacementioning
confidence: 99%