2017
DOI: 10.2197/ipsjjip.25.88
|View full text |Cite
|
Sign up to set email alerts
|

Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation

Abstract: Increasing the size of parallel corpora for less-resourced language pairs is essential for machine translation (MT). To address the shortage of parallel corpora between Chinese and Japanese, we propose a method to construct a quasi-parallel corpus by inflating a small amount of Chinese-Japanese corpus, so as to improve statistical machine translation (SMT) quality. We generate new sentences using analogical associations based on large amounts of monolingual data and a small amount of parallel data. We filter o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…This technique has never been applied to example-based machine translation, but it has been used for statistical machine translation to create new pairs of aligned sentences (in CBR terms, source problems and their solutions) so as to augment the training data (the case base in CBR terms) [41]. Analogical clusters identify well attested transformations, which should thus be reliable.…”
Section: Taking Into Account Retrieval Knowledgementioning
confidence: 99%
“…This technique has never been applied to example-based machine translation, but it has been used for statistical machine translation to create new pairs of aligned sentences (in CBR terms, source problems and their solutions) so as to augment the training data (the case base in CBR terms) [41]. Analogical clusters identify well attested transformations, which should thus be reliable.…”
Section: Taking Into Account Retrieval Knowledgementioning
confidence: 99%