Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies 2001 - 2001
DOI: 10.3115/1073336.1073350
|View full text |Cite
|
Sign up to set email alerts
|

Identifying cognates by phonetic and semantic similarity

Abstract: I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on avera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0
1

Year Published

2003
2003
2016
2016

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(45 citation statements)
references
References 7 publications
0
40
0
1
Order By: Relevance
“…We employ Arabic-English and Swahili-English bitexts to extract a training set (corpora of sizes 5.4M and 14K sentence pairs, respectively), using a cognate discovery technique (Kondrak, 2001). Phonetically and semantically similar strings are classified as cognates; phonetic similarity is the string similarity between phonetic representations, and semantic similarly is approximated by translation.…”
Section: Resourcesmentioning
confidence: 99%
“…We employ Arabic-English and Swahili-English bitexts to extract a training set (corpora of sizes 5.4M and 14K sentence pairs, respectively), using a cognate discovery technique (Kondrak, 2001). Phonetically and semantically similar strings are classified as cognates; phonetic similarity is the string similarity between phonetic representations, and semantic similarly is approximated by translation.…”
Section: Resourcesmentioning
confidence: 99%
“…Kondrak (2001) uses a modified LD measure to detect cognates, as do Schepens, Dijkstra and Grootjen (2012) on a larger scale, and using only standard orthography. They report a classification performance of over 90%.…”
Section: Related Workmentioning
confidence: 99%
“…Cognates are words in different languages having the same etymology and a common ancestor. The methods for cognate detection proposed so far are mostly based on orthographic/phonetic and semantic similarities (Kondrak, 2001;Frunza et al, 2005), but the term "cognates" is often used with a somewhat different meaning, denoting words with high orthographic/phonetic and cross-lingual meaning similarity, the condition of common etymology being left aside. We focus on etymology and we introduce an automatic strategy for detecting pairs of…”
Section: Relationships Identificationmentioning
confidence: 99%