A Probabilistic Generative Model of Linguistic Typology

Bjerva, Johannes; Kementchedjhieva, Yova; Cotterell, Ryan; Augenstein, Isabelle

doi:10.18653/v1/n19-1156

Cited by 22 publications

(21 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Malaviya et al (2017); Murawaki (2017); Bjerva and Augenstein (2018a); Bjerva et al (2019c)), most such work does not take into account that both phylogenetic and geographic proximity should be controlled for. Languages which have shared common ancestry will often have similar typological features, hence training and evaluating on the same language family will tend to inflate the expected performance of the model (Bjerva et al, 2019a). In the data for this shared task, we make sure to control for both of these factors.…”

Section: Evaluation Setupmentioning

confidence: 99%

“…A survey of approaches to prediction of features is provided in Ponti et al (2019a, § 4.3). Some common approaches include prediction based on language representations learned as a by-product of model training (Östling and Tiedemann, 2017;Malaviya et al, 2017;Bjerva and Augenstein, 2018a;Bjerva et al, 2019c) and matrix factorisation (Murawaki, 2017;Bjerva et al, 2019a).…”

Section: Predicting Typological Featuresmentioning

confidence: 99%

See 1 more Smart Citation

SIGTYP 2020 Shared Task: Prediction of Typological Features

Bjerva

Salesky²,

Mielke³

et al. 2020

Proceedings of the Second Workshop on Computational Research in Linguistic Typology

Self Cite

View full text Add to dashboard Cite

Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that most languages only have annotations for some features, and skewed, in that few features have wide coverage. As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs, which is also the focus of this shared task. Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations. However, our error analysis reveals that even the strongest submitted systems struggle with predicting feature values for languages where few features are known.

show abstract

Section: Evaluation Setupmentioning

confidence: 99%

Section: Predicting Typological Featuresmentioning

confidence: 99%

SIGTYP 2020 Shared Task: Prediction of Typological Features

Bjerva

Salesky²,

Mielke³

et al. 2020

Proceedings of the Second Workshop on Computational Research in Linguistic Typology

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this ap proach, the languages are represented as random variables that are explained in terms of other lan guages related to each other through phylogenetic and spatial neighborhood graphs. Bjerva et al (2019) introduce a generative model inspired by the Chomskyan principlesandparameters frame work, drawing on the correlations between typo logical features of languages to tackle the novel task of typological collaborative filtering, a con cept borrowed from the area of recommender sys tems.…”

Section: Related Workmentioning

confidence: 99%

NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task

Gutkin¹,

Sproat²

2020

Proceedings of the Second Workshop on Computational Research in Linguistic Typology

View full text Add to dashboard Cite

This paper describes the NEMO submission to SIGTYP 2020 shared task (Bjerva et al., 2020) which deals with prediction of linguis tic typological features for multiple languages using the data derived from World Atlas of Language Structures (WALS). We employ fre quentist inference to represent correlations be tween typological features and use this repre sentation to train simple multiclass estimators that predict individual features. We describe two submitted ridge regressionbased configu rations which ranked second and third overall in the constrained task. Our best configuration achieved the microaveraged accuracy score of 0.66 on 149 test languages.

show abstract

“…The availability of comparable treebanks -syntactically annotated corpora -for a growing number of typologically distinct languages (most prominently in the collaborative Universal Dependencies project (Nivre et al, 2016)) has led to a recent surge of interest in computational work aiming to detect systematic patterns in the grammatical systems of natural languages and/or to test hypotheses from theoretical work in language typology against empirical evidence. The treebank-based approach (Liu, 2010;Lochbihler, 2017;Gerdes et al, 2019;Bjerva et al, 2019c;Hahn et al, 2020) adds a more data-driven perspective to a strand of research in computational typology (Daumé and Campbell, 2007;Malaviya et al, 2017;Oncevay et al, 2019;Bjerva et al, 2019a;Bjerva et al, 2019b) that is based on carefully curated typological databases such as WALS 1 (Dryer and Haspelmath, 2013) or URIEL 2 .…”

mentioning

confidence: 99%

“…A major focus has been on (a) detecting universals that have the form of an implication between two typological variables, and (b) predicting the value of unknown features in typological databases based on systematic patterns in attested grammatical systems. Graphical models have been widely used to calculate the strength of an implication (Daumé and Campbell, 2007;Lu, 2013;Bjerva et al, 2019b;Bjerva et al, 2019a). While this approach is suitable if one wants to marginalize out the influence of confounding variables, it also constrains the investigated universals to have the form of an implication consisting of one implicand and usually one (but possibly multiple) implicant(s).…”

mentioning

confidence: 99%

Real-Valued Logics for Typological Universals: Framework and Application

Dönicke¹,

Xiang

Kuhn

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

This paper proposes a framework for the expression of typological statements which uses realvalued logics to capture the empirical truth value (truth degree) of a formula on a given data source, e.g. a collection of multilingual treebanks with comparable annotation. The formulae can be arbitrarily complex expressions of propositional logic. To illustrate the usefulness of such a framework, we present experiments on the Universal Dependencies treebanks for two use cases: (i) empirical (re-)evaluation of established formulae against the spectrum of available treebanks and (ii) evaluating new formulae (i.e. potential candidates for universals) generated by a search algorithm.

show abstract

A Probabilistic Generative Model of Linguistic Typology

Cited by 22 publications

References 38 publications

SIGTYP 2020 Shared Task: Prediction of Typological Features

SIGTYP 2020 Shared Task: Prediction of Typological Features

NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task

Real-Valued Logics for Typological Universals: Framework and Application

Contact Info

Product

Resources

About