The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) 2012
DOI: 10.1109/aisp.2012.6313739
|View full text |Cite
|
Sign up to set email alerts
|

State-of-the-art English to Persian Statistical Machine Translation system

Abstract: Comparison of several kinds of English-PersianStatistical Machine Translation systems is reported in this paper. A large parallel corpus containing about 6 million tokens on each side has been developed for training the proposed SMT system. In development of the parallel corpus, a noisy filtering system based on MaxEnt classifier bas been innovated to distinguish between correct and incorrect sentence pairs. By using the generated parallel corpus, a variety of SMT systems on English to Persian languages has be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…Table 2 shows the statistics of Hamshahri corpus. The 20M parallel corpus is constructed from four different parallel corpora: Roman parallel corpus (Mansouri and Faili 2012), Iran Telecommunication Research Center parallel corpus (Jabbari et al 2012), European Language Resources Association English–Persian parallel corpus (Mosavi Miangah 2009), and a part of Mizan parallel corpus b . This corpus consists of 1,109,584 aligned sentences, which have about 20,000,000 words on each side.…”
Section: Data Sets and Experimental Resultsmentioning
confidence: 99%
“…Table 2 shows the statistics of Hamshahri corpus. The 20M parallel corpus is constructed from four different parallel corpora: Roman parallel corpus (Mansouri and Faili 2012), Iran Telecommunication Research Center parallel corpus (Jabbari et al 2012), European Language Resources Association English–Persian parallel corpus (Mosavi Miangah 2009), and a part of Mizan parallel corpus b . This corpus consists of 1,109,584 aligned sentences, which have about 20,000,000 words on each side.…”
Section: Data Sets and Experimental Resultsmentioning
confidence: 99%