Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere 2015
DOI: 10.3115/v1/p15-2021
|View full text |Cite
|
Sign up to set email alerts
|

Lexicon Stratification for Translating Out-of-Vocabulary Words

Abstract: A language lexicon can be divided into four main strata, depending on origin of words: core vocabulary words, fully-and partiallyassimilated foreign words, and unassimilated foreign words (or transliterations). This paper focuses on translation of fullyand partially-assimilated foreign words, called "borrowed words". Borrowed words (or loanwords) are content words found in nearly all languages, occupying up to 70% of the vocabulary. We use models of lexical borrowing in machine translation as a pivoting mechan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 19 publications
0
13
0
Order By: Relevance
“…Finally, as observed in the previous section, there are many Out of Vocabulary OOV words in any language in the world and many researchers, including Tsvetkov & Dyer (2015), concede that transliteration should be the way to go when attempting to translate texts that contain these OOVs. The same can be applied to proper nouns according to Habash (2008) who also believes that transliteration is the proper course of action when it comes to translating texts containing such words.…”
Section: Transliteration and Arabizationmentioning
confidence: 81%
See 2 more Smart Citations
“…Finally, as observed in the previous section, there are many Out of Vocabulary OOV words in any language in the world and many researchers, including Tsvetkov & Dyer (2015), concede that transliteration should be the way to go when attempting to translate texts that contain these OOVs. The same can be applied to proper nouns according to Habash (2008) who also believes that transliteration is the proper course of action when it comes to translating texts containing such words.…”
Section: Transliteration and Arabizationmentioning
confidence: 81%
“…According to Tsvetkov & Dyer (2015), transliterated words are among the four categories of vocabulary in a language and they call these words unassimilated. The remaining three being core words of the language, assimilated and semi-assimilated.…”
Section: Transliteration and Arabizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Lexical borrowing has received relatively little attention in natural language processing area. Tsvetkov and Dyer [7] proposed a morph-phonological transformation model to obtain good performance at predicting donor forms from borrowed forms. Tsvetkov et al [7] suggested to use the lexical borrowing as a model in an SMT framework to translate OOV words.…”
Section: Loanword Identificationmentioning
confidence: 99%
“…Cognates and the problem of cognate identification have been extensively studied in the fields of language typology and historical linguistics, as cognates are considered useful for researching the relatedness of languages (Bhattacharya et al, 2018). Cognates are also used in computational linguistics, e.g., for lexicon extension (Wu and Yarowsky, 2018) or to improve cross-lingual NLP tasks such as machine translation or bilingual word recognition (Kondrak et al, 2003;Tsvetkov and Dyer, 2015).…”
Section: Introductionmentioning
confidence: 99%