Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs

Zhu, Shaolin; Mi, Chenggang; Li, Tianqi; Yang, Yong; Chun, Xu

doi:10.1145/3486677

Cited by 5 publications

(4 citation statements)

References 32 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…XiaYang et al [9] effectively improved the quality of extracted sentence pairs by using a very small seed lexicon (about hundreds of entries) during the process of learning cross-lingual word representations. ShaoLin et al [10] proposed a new unsupervised method for obtaining parallel sentence pairs by mapping bilingual word embeddings through postdoc adversarial training and introducing a new cross-domain similarity adaption. YuSun et al [11] proposed an approach based on transfer learning to mine parallel sentences in an unsupervised setting, which utilizes bilingual corpora of rich-resource language pairs to mine parallel sentences without bilingual supervision of low-resource language pairs.…”

Section: Related Workmentioning

confidence: 99%

“…These methods relied heavily on parallel data, which is unsuitable for low-resource scenarios. While transfer learning and unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models are the current mainstream research direction for low-resource languages [8][9][10][11][12][13], they may not be effective for languages with substantial differences.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Chinese–Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Wen

Guo

et al. 2023

Information

View full text Add to dashboard Cite

Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a lack of large-scale parallel data. The objective of pseudo-parallel sentence extraction is to automatically identify sentence pairs in different languages that convey similar meanings. Earlier methods heavily relied on parallel data, which is unsuitable for low-resource scenarios. The current mainstream research direction is to use transfer learning or unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models; however, these methods are ineffective for languages with substantial differences. To address this issue, we propose a sentence extraction method that leverages image information fusion to extract Chinese–Vietnamese pseudo-parallel sentences from collections of bilingual texts. Our method first employs an adaptive image and text feature fusion strategy to efficiently extract the bilingual parallel sentence pair, and then, a multimodal fusion method is presented to balance the information between the image and text modalities. The experiments on multiple benchmarks show that our method achieves promising results compared to a competitive baseline by infusing additional external image information.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Chinese–Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Wen

Guo

et al. 2023

Information

View full text Add to dashboard Cite

show abstract

“…They are particularly useful for machine translation, where the goal is to automatically translate text from one language to another. However, despite their usefulness, parallel corpora are still lacking in many Asian languages [4], which poses a challenge for researchers and developers working on improving multilingual processing. Oco and Roxas have highlighted the issue of insufficient resources as a major setback in research on Philippine languages [5].…”

Section: Introductionmentioning

confidence: 99%

Development and Validation of Parallel Corpus: A Framework for Building Bikol-Filipino Linguistic Resource

2024

nano-ntp

View full text Add to dashboard Cite

The Philippines boasts a rich array of languages, each with its unique cultural significance and history. Despite their importance as a vital component of the country's linguistic heritage, the growth and development of these low-resource languages has been impeded by the lack of parallel corpusa valuable tool in machine translation. This article presents a comprehensive process of constructing a Bikol-Filipino parallel corpus, commencing from web scraping to sentence and word alignment. The study underlines the significance of evaluating and validating the corpus to ensure its accuracy and reliability. The collected data was utilized to fine-tune a T5-Base transformer model for machine translation and subsequently assessed with the BLEU metric. The resulting score of 73.71 highlights the significance of the generated Bikol-Filipino parallel corpora, making it an invaluable asset for research and development in both languages.

show abstract

“…This data can come in various forms, such as text, images, or numerical values, depending on the task at hand. Next, the data undergoes preprocessing, where it is cleaned, organized, and transformed into a format suitable for analysis [7]. This step may involve handling missing values, normalizing features, or encoding categorical variables [8].Once the data is prepared, a machine learning model is selected and trained using the preprocessed data.…”

Section: Introductionmentioning

confidence: 99%

Realization of Chinese-English Bilingual Speech Dialogue System using Machine Translation Technology

Zhao

2024

jes

View full text Add to dashboard Cite

The realization of a Chinese-English bilingual speech dialogue system through machine translation technology involves developing a sophisticated system capable of seamlessly translating spoken language between Chinese and English in real-time. This system employs cutting-edge machine learning algorithms, neural networks, and natural language processing techniques to accurately interpret and translate speech inputs from one language to another. By integrating advanced speech recognition and translation models, users can engage in fluid and natural conversations across language barriers, opening up new possibilities for cross-cultural communication and interaction. This paper introduces a Statistical Phase-based Bilingual Speech (SPBS) system designed to facilitate seamless language translation and dialogue between multiple languages, with a focus on Chinese and English. Leveraging advanced machine learning models and techniques, such as Recurrent Neural Networks (RNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) architecture, the SPBS system achieves high translation accuracy, computational efficiency, and fluency of translations. The system's multilingual model attains an impressive translation accuracy of 97% while processing 10 sentences per second, with positive feedback on the fluency of translations. Trained on a substantial dataset of 1 million bilingual sentence pairs, the SPBS model maintains a compact size of 500 MB. Furthermore, the paper presents the machine learning settings and training progress of the SPBS system, demonstrating its effectiveness in accurately classifying and translating speech inputs across languages. The system's multilingual model attains an impressive translation accuracy of 97% while processing 10 sentences per second, with positive feedback on the fluency of translations. Trained on a substantial dataset of 1 million bilingual sentence pairs, the SPBS model maintains a compact size of 500 MB.

show abstract

Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs

Cited by 5 publications

References 32 publications

Chinese–Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Chinese–Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Development and Validation of Parallel Corpus: A Framework for Building Bikol-Filipino Linguistic Resource

Realization of Chinese-English Bilingual Speech Dialogue System using Machine Translation Technology

Contact Info

Product

Resources

About