2023
DOI: 10.1145/3486677
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs

Abstract: Parallel sentence pairs play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation. So far, many Asian language pairs lack bilingual parallel sentences. As collecting bilingual parallel data is very time-consuming and difficult, it is very important for many low-resource Asian language pairs. While existing methods have shown encouraging results, they rely on bilingual data seriously or have some drawbacks in an unsupervised situation.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 32 publications
(45 reference statements)
0
4
0
Order By: Relevance
“…XiaYang et al [9] effectively improved the quality of extracted sentence pairs by using a very small seed lexicon (about hundreds of entries) during the process of learning cross-lingual word representations. ShaoLin et al [10] proposed a new unsupervised method for obtaining parallel sentence pairs by mapping bilingual word embeddings through postdoc adversarial training and introducing a new cross-domain similarity adaption. YuSun et al [11] proposed an approach based on transfer learning to mine parallel sentences in an unsupervised setting, which utilizes bilingual corpora of rich-resource language pairs to mine parallel sentences without bilingual supervision of low-resource language pairs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…XiaYang et al [9] effectively improved the quality of extracted sentence pairs by using a very small seed lexicon (about hundreds of entries) during the process of learning cross-lingual word representations. ShaoLin et al [10] proposed a new unsupervised method for obtaining parallel sentence pairs by mapping bilingual word embeddings through postdoc adversarial training and introducing a new cross-domain similarity adaption. YuSun et al [11] proposed an approach based on transfer learning to mine parallel sentences in an unsupervised setting, which utilizes bilingual corpora of rich-resource language pairs to mine parallel sentences without bilingual supervision of low-resource language pairs.…”
Section: Related Workmentioning
confidence: 99%
“…These methods relied heavily on parallel data, which is unsuitable for low-resource scenarios. While transfer learning and unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models are the current mainstream research direction for low-resource languages [8][9][10][11][12][13], they may not be effective for languages with substantial differences.…”
Section: Introductionmentioning
confidence: 99%
“…They are particularly useful for machine translation, where the goal is to automatically translate text from one language to another. However, despite their usefulness, parallel corpora are still lacking in many Asian languages [4], which poses a challenge for researchers and developers working on improving multilingual processing. Oco and Roxas have highlighted the issue of insufficient resources as a major setback in research on Philippine languages [5].…”
Section: Introductionmentioning
confidence: 99%
“…This data can come in various forms, such as text, images, or numerical values, depending on the task at hand. Next, the data undergoes preprocessing, where it is cleaned, organized, and transformed into a format suitable for analysis [7]. This step may involve handling missing values, normalizing features, or encoding categorical variables [8].Once the data is prepared, a machine learning model is selected and trained using the preprocessed data.…”
Section: Introductionmentioning
confidence: 99%