The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2019
DOI: 10.3844/jcssp.2019.1627.1637
|View full text |Cite
|
Sign up to set email alerts
|

Neural Machine Translation for Low-resource English-Bangla

Abstract: Islam. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 27 publications
(36 reference statements)
0
9
0
Order By: Relevance
“…A few parallel corpora are available for Bangla-English MT. In this study, SUPara (Al Mumin et al, 2012) dataset is used as a number of recent studies have used this corpus (Al Mumin et al, 2019aMumin et al, , 2019bHasan et al, 2019aHasan et al, , 2019b. The dataset contains 70861, 500 and 500 parallel sentences for training, validation and test sets, respectively.…”
Section: Benchmark Data and Preprocessingmentioning
confidence: 99%
“…A few parallel corpora are available for Bangla-English MT. In this study, SUPara (Al Mumin et al, 2012) dataset is used as a number of recent studies have used this corpus (Al Mumin et al, 2019aMumin et al, , 2019bHasan et al, 2019aHasan et al, , 2019b. The dataset contains 70861, 500 and 500 parallel sentences for training, validation and test sets, respectively.…”
Section: Benchmark Data and Preprocessingmentioning
confidence: 99%
“…In our experiment, we used Shahjalal University parallel (SUPara) (Mumin et al, 201; 2018b) corpus and GolbalVoices (Tiedemann, 2012) corpus from OPUS (Tiedemann, 2012) as a training dataset. SUPara (Mumin et al, 2012;2018b) is a balanced corpus consists of texts from different genres like literature, journalistic texts, instructive texts, administrative texts, and texts treating external communication, which are collected from various printed and online media. GolbalVoices (Tiedemann, 2012) corpus consists of only news texts collected form GlobalVoices website iv .…”
Section: Datasetmentioning
confidence: 99%
“…These two datasets were developed with a vision of using them as a benchmark in English-Bangla MT research. The texts of these two datasets were well-chosen from balanced SUPara (Mumin et al, 2012;2018b) corpus, thus these two datasets are also balanced in genre. In addition, to make these datasets representative in length we selected the texts from 10 subsets of different lengths: 1 to 5 words, 6 to 10 and so forth up to 40 to 45 and finally longer than 45 words.…”
Section: Datasetmentioning
confidence: 99%
See 2 more Smart Citations