Japanese–English Conversation Parallel Corpus for Promoting Context-aware Machine Translation Research

Rikters, Matīss; Ri, Ryokan; Li, Tong; Nakazawa, Toshiaki

doi:10.5715/jnlp.28.380

Cited by 8 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, some improvements obtained with context-aware models, as measured by standard translation metrics, may be attributed to context-driven regularisation acting as a noise generator, particularly with small-scale data (Kim et al, 2019;Li et al, 2020). Nonetheless, several studies have established that context information can indeed be effectively modelled to tackle discursive phenomena in NMT beyond the sentence level (Liu and Zhang, 2020;Rikters and Nakazawa, 2021;Xu et al, 2021;Mansimov et al, 2021;Gete et al, 2023).…”

Section: Related Workmentioning

confidence: 99%

“…Context modelling in NMT is typically performed over a fixed window of preceding or following context sentences (Zhang et al, 2018;Voita et al, 2018Voita et al, , 2019bYang et al, 2019;Rikters and Nakazawa, 2021). This is due in part to the challenges associated with long-distance context modelling, resulting in degraded translation quality on long-range context (Junczys-Dowmunt, 2019; Tan et al, 2019;Zheng et al, 2020;Sun et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

TANDO^+: Corpus and Baselines for Document-level Machine Translation in Basque-Spanish and Basque-French

Gete,

Etchegoyhen,

Labaka

et al. 2024

Preprint

View full text Add to dashboard Cite

Context-aware Neural Machine Translation can potentially enhance automated translation quality through effective modelling of context beyond the sentence level. However, suitable corpora for contextual modelling are still scarce, presenting a significant challenge for the training and evaluation of context-aware systems. To address this challenge, we describe \textsc{tando^+}, a document-level corpus for the under-resourced language pairs Basque-French and Basque-Spanish. We provide a detailed description of this corpus, which is to be shared with the scientific community. The corpus comprises parallel data from diverse domains (literature, subtitles, and news) and incorporates context-level information. Additionally, it provides manually crafted contrastive test sets for Basque-Spanish, designed for comprehensive assessment of gender and register contextual phenomena. Additionally, we train and evaluate sentence-level baseline models and several state-of-the-art contextual variants. Our results and analyses indicate that the corpus is well-suited to train and evaluate context-aware machine translation systems for the two selected under-resourced language pairs.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

TANDO^+: Corpus and Baselines for Document-level Machine Translation in Basque-Spanish and Basque-French

Gete,

Etchegoyhen,

Labaka

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…In the Japanese passive statistical machine translation model, the corresponding passive parameters and the parameters to be translated are trained to form the search results with the maximum probability. If different types of passives are decoded, the optimal path for translation is finally formed [12].…”

Section: Discriminant Modelmentioning

confidence: 99%

Machine Translation of Scheduling Joint Optimization Algorithm in Japanese Passive Statistics

Liu

2022

Scientific Programming

View full text Add to dashboard Cite

Machine translation is different from written translation. How to improve the performance of machine translation has been a research hotspot in current research on machine translation. In this paper, based on the semantic analysis and research of Japanese passive, a joint optimization algorithm of scheduling has been proposed, and the machine translation of Japanese passive has been studied. At present, machine translation is more and more widely used. Machine translation has solved many vocabulary problems, and it can complete a large amount of translation work and save a lot of manual translation time. While improving the translation speed, in the process of Japanese passive translation, it is also found that direct machine translation shows many shortcomings, and the quality of passive translation is not particularly ideal, exposing the basic problems of machine translation, such as semantic errors, syntactic errors, unclear and rigid expressions, and messy structures. In response to the problems above, this paper has improved the machine translation model for scheduling joint optimization algorithms. The paper has proposed several optimization algorithms and used resource awareness and computing power scheduling algorithms to conduct experimental analysis of translation performance. Finally, it is found that, among the two scheduling optimization algorithms, the resource-aware scheduling algorithm has better performance. With the same data, the resource-aware scheduling algorithm has saved 15.5% of the time compared with the computing power scheduling algorithm, and the accuracy of Japanese passive translation was 6%, 5%, and 21% higher than the computing power scheduling algorithm under different data volumes. Not only has the time taken been shortened, but the translation accuracy has also been improved.

show abstract

Data mining based on computer game search algorithm in Japanese E-learning and multi context translation

Jia

2025

Entertainment Computing

View full text Add to dashboard Cite

Japanese–English Conversation Parallel Corpus for Promoting Context-aware Machine Translation Research

Cited by 8 publications

References 13 publications

TANDO^+: Corpus and Baselines for Document-level Machine Translation in Basque-Spanish and Basque-French

TANDO^+: Corpus and Baselines for Document-level Machine Translation in Basque-Spanish and Basque-French

Machine Translation of Scheduling Joint Optimization Algorithm in Japanese Passive Statistics

Data mining based on computer game search algorithm in Japanese E-learning and multi context translation

Contact Info

Product

Resources

About