Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.814
|View full text |Cite
|
Sign up to set email alerts
|

XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment

Abstract: Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align), a technique to automatically mine cross-lingual entity lexica from mined web data. We demonstrate LSP-Align outpe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 22 publications
0
0
0
Order By: Relevance
“…15 For each corpus, a native speaker of each included language variety was asked to label a 14 https://iso639-3.sil.org/request/2008 -040. 15 We do not include XLEnt (El-Kishky et al, 2021) since it comprises cross-lingual named entities rather than texts. random sample of 50 texts (or parallel texts, in CCAligned and WikiMatrix) according to the labeling scheme and guidelines presented in Kreutzer et al (2022).…”
Section: Little Attention To Representativenessmentioning
confidence: 99%
“…15 For each corpus, a native speaker of each included language variety was asked to label a 14 https://iso639-3.sil.org/request/2008 -040. 15 We do not include XLEnt (El-Kishky et al, 2021) since it comprises cross-lingual named entities rather than texts. random sample of 50 texts (or parallel texts, in CCAligned and WikiMatrix) according to the labeling scheme and guidelines presented in Kreutzer et al (2022).…”
Section: Little Attention To Representativenessmentioning
confidence: 99%