2014
DOI: 10.1007/s10579-014-9277-0
|View full text |Cite
|
Sign up to set email alerts
|

An overview of the European Union’s highly multilingual parallel corpora

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(18 citation statements)
references
References 14 publications
0
17
0
Order By: Relevance
“…For a broader domain coverage of datasets necessary to train an SMT system, we merged several parallel corpora, e.g. JRC-Acquis [3], Europarl [4], DGT (translation memories generated by the Directorate-General for Translation) [5], MultiUN corpus [6] and TED talks [7] among others, into one parallel dataset. For the translation approach, we engage the widely used Moses toolkit [8].…”
Section: Statistical Machine Translationmentioning
confidence: 99%
“…For a broader domain coverage of datasets necessary to train an SMT system, we merged several parallel corpora, e.g. JRC-Acquis [3], Europarl [4], DGT (translation memories generated by the Directorate-General for Translation) [5], MultiUN corpus [6] and TED talks [7] among others, into one parallel dataset. For the translation approach, we engage the widely used Moses toolkit [8].…”
Section: Statistical Machine Translationmentioning
confidence: 99%
“…These sentences were extracted from different parallel corpora: 20 News Commentary (Bojar et al, 2013) for Spanish, the Basque Public Administration Institute translation memory 21 for Basque, the corpus from the official journal of the Catalan Goverment (Tiedemann, 2012) for Catalan, and the DGT translation memory (Steinberger et al, 2014) for Maltese. A set of 174,441 segments were randomly extracted from each of these corpora.…”
Section: Datamentioning
confidence: 99%
“…For a broader domain coverage of an SMT system, we merged several parallel corpora necessary to train an SMT system, e.g. JRC-Acquis [13], Europarl [14], DGT (translation memories generated by the Directorate-General for Translation) [15], Mul-tiUN corpus [16] and TED talks [17] among others, into one parallel dataset. For the translation approach, the OTTO System engages the widely used Moses toolkit [18].…”
Section: Statistical Machine Translationmentioning
confidence: 99%