A Database and Visualization of the Similarity of Contemporary Lexicons

Bella, Gábor; Batsuren, Khuyagbaatar; Giunchiglia, Fausto

doi:10.1007/978-3-030-83527-9_8

Cited by 8 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…( Notes: ∆ Imm Lang Same is the change in the share of immigrants speaking Spanish (All of America, except Brazil). ∆ Imm Lang Med and ∆ Imm Lang Far is the change in the share of immigrants speaking languages relatively close or far from Spanish based on a lexical similarity score (Bella et al 2021). Table A15 shows the list of countries and similarity scores in each group.…”

Section: Discussionmentioning

confidence: 99%

“…Language is one of many dimensions of culture, and research suggests that cultural distance between different groups in a society may negatively affect their willingness to redistribute or provide public goods (Luttmer andSinghal 2011, Desmet et al 2009). I test for this channel by borrowing a measure of language similarity between Spanish and immigrant's countries of origin main (most spoken) language from Bella et al (2021). Based on that measure, I divide immigrants between those speaking the same language (All America except Brazil), a relatively similar language, or a relatively distant language.…”

Section: Mechanisms and Effects On Anti-immigration Platformsmentioning

confidence: 99%

“…Language similarity between Spanish and the most represented immigrant groups Similarity is the lexical similarity index (0-100) between Spanish and a given language computed inBella et al (2021). * Country's most spoken language is not available in the data.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Neighborhoods, Perceived Immigration, and Preferences for Redistribution: Evidence from Barcelona

Domènech-Arumí

2023

SSRN Journal

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Section: Mechanisms and Effects On Anti-immigration Platformsmentioning

confidence: 99%

See 1 more Smart Citation

Neighborhoods, Perceived Immigration, and Preferences for Redistribution: Evidence from Barcelona

Domènech-Arumí

2023

SSRN Journal

View full text Add to dashboard Cite

“…To explain where the differences across languages come from, we compute how the differences correlate with the geographical distance of the countries where the languages are spoken, the GDP of the countries, and the lexical similarity of the languages (Bella et al, 2021). 3…”

Section: Discussionmentioning

confidence: 99%

On the Language Neutrality of Pre-trained Multilingual Representations

Libovický¹,

Rosa²,

Fraser³

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Multilingual contextual embeddings, such as multilingual BERT and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on morphological and syntactic tasks. We instead investigate the languageneutrality of multilingual contextual embeddings directly and with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings, which are explicitly trained for language neutrality. Contextual embeddings are still only moderately languageneutral by default, so we propose two simple methods for achieving stronger language neutrality: first, by unsupervised centering of the representation for each language and second, by fitting an explicit projection on small parallel data. Besides, we show how to reach stateof-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences without using parallel data.

show abstract

“…The Universal Knowledge Core (UKC) [24,25] is a large-scale MLDB that contains about 2 million words in over 2,000 languages [8]. 12 It integrates a variety of resources such as individual wordnets such as [23,9], Wiktionary, as well as original multilingual content on phenomena related to linguistic diversity [24], such as cognacy [5], metonymy [30], lexical gaps [29], morphology [4], lexical similarity [6]. The UKC has a two-layered architecture, with a language layer that contains a separate wordnet-like graph (with words, senses, and synsets) for each language, as well as a supra-lingual layer of interlingual concepts [25] (Figure 4).…”

Section: The Universal Knowledge Corementioning

confidence: 99%

Representing Interlingual Meaning in Lexical Databases

Giunchiglia¹,

Bella²,

Nair³

et al. 2023

Preprint

View full text Add to dashboard Cite

In today's multilingual lexical databases, the majority of the world's languages are under-represented. Beyond a mere issue of resource incompleteness, we show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words and in mapping them across languages. In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner. Our paper assesses state-of-the-art multilingual lexical databases and evaluates their strengths and limitations with respect to their expressivity on lexical phenomena of linguistic diversity.

show abstract

A Database and Visualization of the Similarity of Contemporary Lexicons

Cited by 8 publications

References 11 publications

Neighborhoods, Perceived Immigration, and Preferences for Redistribution: Evidence from Barcelona

Neighborhoods, Perceived Immigration, and Preferences for Redistribution: Evidence from Barcelona

On the Language Neutrality of Pre-trained Multilingual Representations

Representing Interlingual Meaning in Lexical Databases

Contact Info

Product

Resources

About