2022
DOI: 10.3390/app12126002
|View full text |Cite
|
Sign up to set email alerts
|

WCC-JC: A Web-Crawled Corpus for Japanese-Chinese Neural Machine Translation

Abstract: Currently, there are only a limited number of Japanese–Chinese bilingual corpora of a sufficient amount that can be used as training data for neural machine translation (NMT). In particular, there are few corpora that include spoken language such as daily conversation. In this research, we attempt to construct a Japanese–Chinese bilingual corpus of a certain scale by crawling the subtitle data of movies and TV series from the websites. We calculated the BLEU scores of the constructed WCC-JC (Web Crawled Corpus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…This traditional translation process was time-consuming [ 17 ]. Zhang et al used web crawlers to obtain relevant information required for the text in the translation process, professionally classified the translated text, and established a professional data corpus to facilitate the translation work [ 18 ]. Shreffler et al conducted an experimental design on the preparation of translation work and found that MT could perform pretranslation work [ 19 ].…”
Section: Introductionmentioning
confidence: 99%
“…This traditional translation process was time-consuming [ 17 ]. Zhang et al used web crawlers to obtain relevant information required for the text in the translation process, professionally classified the translated text, and established a professional data corpus to facilitate the translation work [ 18 ]. Shreffler et al conducted an experimental design on the preparation of translation work and found that MT could perform pretranslation work [ 19 ].…”
Section: Introductionmentioning
confidence: 99%
“…It is no exaggeration to say that after entering the 21st century, almost everyone who lives in the information network era has to deal with machine translation directly or indirectly. No matter in science and technology, business, or politics, machine translation is undoubtedly a very important practical subject [3,4]. e traditional machine translation method uses pipeline successive operations to mark the part of speech and analyze the syntax of the original corpus, so as to obtain the syntax structure of English language, which leads to the iterative transmission of errors between translation tasks and the reduction of the accuracy of structured examples, resulting in the reduction of the accuracy of English language and literature translation [5].…”
Section: Introductionmentioning
confidence: 99%
“…Liu et al [29] introduced DuRecDial 2.0, a bilingual parallel dialog dataset for English and Chinese, aimed at advancing monolingual, multilingual, and cross-lingual conversational recommendation systems, showcasing the benefits of incorporating additional English data for Chinese conversational recommendations. Furthermore, Zhang et al [30,31] made significant contributions by developing the WCC-JC Japanese-Chinese translation corpus and the manually aligned WCC-JC 2.0, a large-scale Japanese-Chinese parallel corpus, through web crawling, providing considerable support for Japanese-Chinese translation research.…”
Section: Corpus Constructionmentioning
confidence: 99%