Using support vector machines and state-of-the-art algorithms for
            phonetic alignment to identify cognates in multi-lingual wordlists

Jäger, Gerhard; List, Johann‐Mattis; Sofroniev, Pavel

doi:10.18653/v1/e17-1113

Cited by 42 publications

(40 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using machine learning algorithms is not new to the field of linguistics, though it is one of the more recent methods. 1 While these approaches are found in an increasing number of studies in lin-guistics in general, in historical linguistics in particular the method is less used although some studies have been published in this or adjacent fields such as cladistics (Jäger et al, 2017;Jäger and Sofroniev, 2016). Since this approach of predicting sound features by the features in the phonetic environment only works synchronically, the deep neural network used for this needs to be trained on better known phonological features as the basis for predicting unknown features.…”

Section: The Deep Neural Network Approachmentioning

confidence: 99%

Predicting Historical Phonetic Features using Deep Neural Networks: A Case Study of the Phonetic System of Proto-Indo-European

Hartmann¹

2019

Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

Traditional historical linguistics lacks the possibility to empirically assess its assumptions regarding the phonetic systems of past languages and language stages beyond traditional methods such as comparative tools to gain insights into phonetic features of sounds in proto-or ancestor languages. The paper at hand presents a computational method based on deep neural networks to predict phonetic features of historical sounds where the exact quality is unknown and to test the overall coherence of reconstructed historical phonetic features. The method utilizes the principles of coarticulation, local predictability and statistical phonological constraints to predict phonetic features by the features of their immediate phonetic environment. The validity of this method will be assessed using New High German phonetic data and its specific application to diachronic linguistics will be demonstrated in a case study of the phonetic system Proto-Indo-European.

show abstract

Section: The Deep Neural Network Approachmentioning

confidence: 99%

Predicting Historical Phonetic Features using Deep Neural Networks: A Case Study of the Phonetic System of Proto-Indo-European

Hartmann¹

2019

Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

show abstract

“…We created a goldstandard dataset from the data used in [22] (which is is drawn from the same sources as the data used in [21] but has been manually edited to correct annotation mistakes). Only the 40 ASJP concepts were used.…”

Section: Creating a Goldstandardmentioning

confidence: 99%

Global-scale phylogenetic linguistic inference from lexical resources

Jäger

2018

Sci Data

Self Cite

View full text Add to dashboard Cite

Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two thirds of the extant world-wide linguistic diversity. First, we estimated Pointwise Mutual Information scores between sound classes using weighted sequence alignment and general-purpose optimization. From this we computed a dissimilarity matrix over all ASJP word lists. This matrix is suitable for distance-based phylogenetic inference. Second, we applied cognate clustering to the ASJP data, using supervised training of an SVM classifier on expert cognacy judgments. Third, we defined two types of binary characters, based on automatically inferred cognate classes and on sound-class occurrences. Several tests are reported demonstrating the suitability of these characters for character-based phylogenetic inference.

show abstract

“…B-Cubed scores offer a straightforward way to compare partitioning analyses (or cluster analyses) with each other. In the task of automatic cognate detection in computational historical linguistics, for example, B-Cubed scores are frequently used to compare how well an algorithm performs in comparison with a gold standard (Hauer and Kondrak 2011;Jäger, List, and Sofroniev 2017;List, Greenhill, and Gray 2017).…”

Section: Measuring Differences In Reconstruction Systemsmentioning

confidence: 99%