Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world's languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.
Troll internet messages, especially those posted on Twitter, have recently been recognised as a very powerful weapon in hybrid warfare. Hence, an important task for the academic community is to provide a tool for identifying internet troll accounts as quickly as possible. At the same time, this tool must be highly accurate so that its employment will not violate people's rights and affect the freedom of speech. Though such a task can be effectively fulfilled on purely linguistic grounds, as of yet, very little work has been done that could help to explain the discourse-specific features of this type of writing. In this paper, we suggest a quantitative measure for identifying troll messages which is based on taking into account certain sociolinguistic limitations of troll speech, and discuss two algorithms that both require as few as 50 tweets to establish the true nature of the tweets, whether 'genuine' or 'troll-like'.
This paper examines the acquisition of demonstratives (e.g., that, there) from a cross-linguistic perspective. Although demonstratives are often said to play a crucial role in L1 acquisition, there is little systematic research on this topic. Using extensive corpus data of spontaneous child speech, the paper investigates the emergence and development of demonstratives in three European (English, French, Spanish) and four non-European languages (Japanese, Chinese, Hebrew, Indonesian) between age 1;0 and 6;0. The data show that, across languages, demonstratives are among the earliest and most frequent child words, but their frequency decreases with age and MLU. As children grow older, they tend to use other types of referring terms (e.g., anaphoric pronouns) and other types of spatial expressions (e.g., adpositions). Considering these results, we hypothesize that children shift from using a body-oriented strategy of deictic communication to more abstract and disembodied strategies of encoding reference and space during the preschool years.
The current study yielded a number of important findings. We managed to build a neural network that achieved an accuracy score of 91 per cent in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypothesised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of many different language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with those topics, we showed some very pronounced distributional anomalies, thus confirming our prediction.
There is little doubt that one of the most important areas of future research within the framework of Construction Grammar will be the comparative study of constructions in different languages of the world. One significant gain that modern Construction Grammar can make thanks to the cross-linguistic perspective is finding a clue to some contradictory cases of construction alternation. The aim of the present paper is to communicate the results of a case study of two pairs of alternating constructions in English and Russian: s-genitive (SG) and of-genitive (OG) in English and noun + noun in genitive case (NNG) and relative adjective derived from noun + noun (ANG) in Russian. It is evident that the long years of elaborate scientific analysis have not yielded any universally accepted view on the problem of English genitive alternation. There are at least five different accounts of this problem: the hypotheses of the animacy hierarchy, given-new hierarchy, topic-focus hierarchy, end-weight principle, and two semantically distinct constructions. We hypothesised that in this case the comparison of the distribution of two English and two Russian genitives could be insightful. The analysis presupposed two consecutive steps. First, we established an inter-language comparability of two pairs of constructions in English and Russian. Second, we tested the similarity of intra-language distribution of each pair of constructions from the perspective of the animacy hierarchy. For these two purposes, two types of corpora were used: (1) a translation corpus consisting of original texts in one language and their translations into one or more languages; and (2) national corpora consisting of original texts in two respective languages. It was established that in both languages, the choice between members of an alternating pair is governed by the rules of animacy hierarchisation. Additionally, it was possible to disprove the idea that the animacy hierarchy is necessarily based on the linearisation hierarchy. Two Russian constructions are typologically aligned with their English counterparts, not on the grounds of the linear order of head and modifier but on the grounds of structural similarity. The English SG and Russian NNG construction are diametrically opposed in terms of word order. However, they reveal the same underlying structure of the inflectional genitive as contrasted with the analytical genitive of the Russian ANG and the English OG. These findings speak strongly in favour of the animacy hierarchy account of English genitive alternation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.