Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1267
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Cognate Sets Across Dictionaries of Related Languages

Abstract: We present a system for identifying cognate sets across dictionaries of related languages. The likelihood of a cognate relationship is calculated on the basis of a rich set of features that capture both phonetic and semantic similarity, as well as the presence of regular sound correspondences. The similarity scores are used to cluster words from different languages that may originate from a common protoword. When tested on the Algonquian language family, our system detects 63% of cognate sets while maintaining… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
24
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 29 publications
(24 citation statements)
references
References 19 publications
0
24
0
Order By: Relevance
“…The task of cognate detection, i.e., the search for genetically related words in different languages, has traditionally been regarded as a task that is barely automatable. During the last decades, however, automatic cognate detection approaches since Covington (1996) have been constantly improved following the work of Kondrak (2002), both regarding the quality of the inferences (List et al, 2017b;, and the sophistication of the methods (Hauer and Kondrak, 2011;Rama, 2016;, which have been expanded to account for the detection of partial cognates (List et al, 2016b), language specific sound-transition weights (List, 2012) or the search of cognates in whole dictionaries (St Arnaud et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…The task of cognate detection, i.e., the search for genetically related words in different languages, has traditionally been regarded as a task that is barely automatable. During the last decades, however, automatic cognate detection approaches since Covington (1996) have been constantly improved following the work of Kondrak (2002), both regarding the quality of the inferences (List et al, 2017b;, and the sophistication of the methods (Hauer and Kondrak, 2011;Rama, 2016;, which have been expanded to account for the detection of partial cognates (List et al, 2016b), language specific sound-transition weights (List, 2012) or the search of cognates in whole dictionaries (St Arnaud et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…Each dataset in this shared task pairs a lowresource (LR) language with a related highresource (HR) language. Genetically related languages share cognates, words with a common linguistic origin (St Arnaud et al, 2017). For example, the Latin word oculus 'eye' is cognate with the Romanian word ochi.…”
Section: Cognate Projection Methodsmentioning
confidence: 99%
“…If we decided, for example, that the pattern C in Figure 5 could by no means cluster with E and F, this may well be premature before we have figured out whether the two patterns (u-u-u-u vs. u-u-u-au) are complementary and what phonetic environments explain their complementarity. 13 For automatic cognate detection, compare for example List (2014), List, Greenhill, and Gray (2017), Arnaud, Beck, and Kondrak (2017), and Jäger, List, and Sofroniev (2017), and for automatic phonetic alignment, compare Prokić, Wieling, and Nerbonne (2009) and List (2014). 14 For manual annotation of cognates and alignments, compare List (2017).…”
Section: Implementation Input Format and Output Formatmentioning
confidence: 99%