Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages

Dautriche, Isabelle; Mahowald, Kyle; Gibson, Edward; Piantadosi, Steven T.

doi:10.1111/cogs.12453

Cited by 57 publications

(69 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is probably driven by the differences in phonologies between languages, (or possibly due to iconic sound-meaning associations across languages, Blasi et al, 2016;; Dautriche et al, 2016), raising the possibility that the question words are not special. However, the E f for question words is lower than for the other sets and the z-values are twice as extreme, as can be seen in figure 4 which shows the comparisons of Ef to the permuted distributions.…”

Section: Samplementioning

confidence: 99%

A case for systematic sound symbolism in pragmatics: Universals in wh-words

Slonimska

Roberts

2017

Journal of Pragmatics

View full text Add to dashboard Cite

General rightsThis document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/about/ebr-terms A case for systematic sound symbolism in pragmatics: Universals in wh-wordsAbstract. This study investigates whether there is a universal tendency for content interrogative words (wh-words) within a language to sound similar in order to facilitate pragmatic inference in conversation. Gaps between turns in conversation are very short, meaning that listeners must begin planning their turn as soon as possible. While previous research has shown that paralinguistic features such as prosody and eye gaze provide cues to the pragmatic function of upcoming turns, we hypothesise that a systematic phonetic cue that marks interrogative words would also help early recognition of questions (allowing early preparation of answers), for instance whwords sounding similar within a language. We analyzed 226 languages from 66 different language families by means of permutation tests. We found that initial segments of wh-words were more similar within a language than between languages, also when controlling for language family, geographic area (stratified permutation) and analyzability (compound phrases excluded). Random samples tests revealed that initial segments of wh-words were more similar than initial segments of randomly selected word sets and conceptually related word sets (e.g., body parts, actions, pronouns). Finally, we hypothesized that this cue would be more useful at the beginning of a turn, so the similarity of the initial segment of wh-words should be greater in languages that place them at the beginning of a clause. We gathered typological data on 110 languages, and found the predicted trend, although statistical significance was not attained. While there may be several mechanisms that bring about this pattern (e.g., common derivation), we suggest that the ultimate explanation of the similarity of interrogative words is to facilitate early speech-act recognition. Importantly, this hypothesis can be tested empirically, and the current results provide a sound basis for future experimental tests.

show abstract

Section: Samplementioning

confidence: 99%

A case for systematic sound symbolism in pragmatics: Universals in wh-words

Slonimska

Roberts

2017

Journal of Pragmatics

View full text Add to dashboard Cite

show abstract

“…On the one hand, methods proposals and critiques accompanied by exploratory results (Dubossarsky et al, 2017;Frermann and Lapata, 2016;Gulordava and Baroni, 2011;Hamilton et al, 2016b;Jatowt and Duh, 2014;Kulkarni et al, 2015;Sagi et al, 2011;Schlechtweg et al, 2017;Wijaya and Yeniterzi, 2011). On the other, applications of these methods, usually with more specific linguistic questions in mind (Dautriche et al, 2016;Dubossarsky et al, 2016;Hamilton et al, 2016a;Perek, 2016;Rodda et al, 2016;Xu and Kemp, 2015). Notably, all of these approaches are, one way or another, based on (co-occurrence) frequencies of words, and as such naturally subject to sampling biases potentially introduced by uneven representation of topics and genres in a corpus.…”

Section: Previous Researchmentioning

confidence: 99%

Quantifying the dynamics of topical fluctuations in language

et al. 2020

View full text Add to dashboard Cite

Abstract The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a simple model for controlling for topical fluctuations in corpora—the topical-cultural advection model—and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic corpus spanning two centuries, and a carefully-controlled artificial language change scenario, and then use it to correct for topical fluctuations in historical time series. Finally, we use the model to show that the emergence of new words typically corresponds with the rise of a trending topic. This suggests that some lexical innovations occur due to growing communicative need in a subspace of the lexicon, and that the topical-cultural advection model can be used to quantify this.

show abstract

“…Unlike Dautriche et al (2017), who draw lexicons from Wikipedia, or Otis and Sagi (2008), we directly use a phone string representation, rather than their proxy of using each language's orthography. This makes our work the first to quantify the interface between phones and meaning in a massively multilingual setting.…”

Section: Datasetsmentioning

confidence: 99%

Meaning to Form: Measuring Systematicity as Information

Pimentel

McCarthy

Blasí

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade? For instance, does the character bigram gl have any systematic relationship to the meaning of words like glisten, gleam and glow? In this work, we offer a holistic quantification of the systematicity of the sign using mutual information and recurrent neural networks. We employ these in a data-driven and massively multilingual approach to the question, examining 106 languages. We find a statistically significant reduction in entropy when modeling a word form conditioned on its semantic representation. Encouragingly, we also recover wellattested English examples of systematic affixes. We conclude with the meta-point: Our approximate effect size (measured in bits) is quite small-despite some amount of systematicity between form and meaning, an arbitrary relationship and its resulting benefits dominate human language.

show abstract

Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages

Cited by 57 publications

References 53 publications

A case for systematic sound symbolism in pragmatics: Universals in wh-words

A case for systematic sound symbolism in pragmatics: Universals in wh-words

Quantifying the dynamics of topical fluctuations in language

Meaning to Form: Measuring Systematicity as Information

Contact Info

Product

Resources

About