2011
DOI: 10.1007/s10590-011-9090-0
|View full text |Cite
|
Sign up to set email alerts
|

Apertium: a free/open-source platform for rule-based machine translation

Abstract: Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
141
0
8

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
5

Relationship

3
7

Authors

Journals

citations
Cited by 177 publications
(159 citation statements)
references
References 16 publications
0
141
0
8
Order By: Relevance
“…For actual morphological analysis, we use the lttoolbox formalism, a morphological analysis framework used within the open-source machine translation framework, Apertium (Forcada et al, 2011). There exists an lttoolbox-based morphological analyser for Marathi 2 , with a coverage of 80% on the Marathi Wikipedia.…”
Section: Lttoolboxmentioning
confidence: 99%
“…For actual morphological analysis, we use the lttoolbox formalism, a morphological analysis framework used within the open-source machine translation framework, Apertium (Forcada et al, 2011). There exists an lttoolbox-based morphological analyser for Marathi 2 , with a coverage of 80% on the Marathi Wikipedia.…”
Section: Lttoolboxmentioning
confidence: 99%
“…The parallel corpora exploited for all the language pairs include: Common Crawl mined from the public web crawl hosted on Amazon's Elastic Cloud [59], EuroParl version 6 extracted from the proceedings of the EU Parliament [11], JRC-Acquis Multilingual Parallel Corpus version 3.0 extracted from Acquis Communautaire, the total body of European Union law [13], the News Commentary corpus of news analysis from the Project Syndicate [14], and the OJEU corpus with texts from the Official Journal of the European Union including legislation documents, information notices, and public procurements, made available by the Apertium project [60]. We also make use of the dictionary data extracted from DBpedia and not identified as medical-domain, see Section 2.2.1.…”
Section: General-domain Parallel Datamentioning
confidence: 99%
“…The words originating from other languages (less than 0.1%, and principally Latin) are labelled solely with their language. The part-of-speech categories and their tags -shown in Table 1 together with their relative frequencies-are based on those defined by the Apertium machine translation platform (Forcada et al, 2011) for the dictionaries in the Spanish-Catalan language pair since, to the best of our knowledge, there is no Spanish diachronic lexicon available in digital form under a free/open-source license. Having a lexicon in digital form facilitated the manual annotation of the corpus as described below.…”
Section: Corpus Descriptionmentioning
confidence: 99%