2021
DOI: 10.1007/s42979-021-00723-4
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Orthographic Information in Machine Translation

Abstract: Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are no… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
7
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 104 publications
0
7
0
2
Order By: Relevance
“…According to incomplete statistics, there are now around 7,000 human languages [ 1 ]. Most of the current machine translation technology is based on big data.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…According to incomplete statistics, there are now around 7,000 human languages [ 1 ]. Most of the current machine translation technology is based on big data.…”
Section: Introductionmentioning
confidence: 99%
“…Only by training on a large amount of data can we get a better effect. In fact, only a few languages such as Chinese, English, French, and German have more training data, and there are almost no training resources available for other languages [ 1 ]. In some specialized fields, such as marine science and technology, there are fewer resources in Chinese and other languages.…”
Section: Introductionmentioning
confidence: 99%
“…The performance of NMT schemes even in the case of noisy data could be improved with Fuzzy match retrieval method combined with source-target concatenation. 42 In works, [43][44][45][46][47][48][49][50] CAT tools for translating into Indian languages were presented. In Reference 50, the authors presented an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil language.…”
Section: Related Workmentioning
confidence: 99%
“…This notation (like many other similar notations) can project all the Indic or Brahmi origin scripts [40], which have -in many 1 Using encoding converters, such as https://pypi.org/project/ wxconv/ cases -different Unicode blocks, into a common character space. Our intuition, is that this should help in capturing phonological, orthographic, and, to some extent, morphosyntactic similarities that will help a neural network-based model in better multilingual learning and translation across this languages [38,39,67]. We do this by using this WX-converted text to learn byte pair encoding-based embeddings.…”
Section: Introductionmentioning
confidence: 99%