Productivity in Argument Selection

Zeldes, Amir

doi:10.1515/9783110303919

Cited by 60 publications

(45 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This assertion neglects a fundamental property of the frequency distribution of words (Baayen, 2001;Baroni, 2008;Zanette & Montemurro, 2005;Zipf, 1935Zipf, /1965, but also of word sequences (Bannard & Lieven, 2009;Baroni, 2008;Ha, Sicilia-Garcia, Ming, & Smith, 2002), in human languages: its Zipfian nature, which has been observed in each analysed natural language and for all the lengths of texts and corpora from a few thousand words up to several tens of millions. In any text or corpus, 'a few words occur with very high frequency while many words occur but rarely' (Zipf, 1935(Zipf, /1965, and this overrepresentation of rare items is larger for smaller texts and corpora (Baayen, 2001;McEnery & Gabrielatos, 2006;Zeldes, 2013;Zipf, 1935Zipf, /1965. However, when the same normalized frequency threshold is used in corpora of different sizes, this overrepresentation of rare words and rare sequences in the smaller corpora is not taken into account and a disproportionately large number of word sequences is selected from them.…”

Section: Discussionmentioning

confidence: 99%

Comparing Lexical Bundles across Corpora of Different Sizes: The Zipfian Problem

Bestgen

2019

Journal of Quantitative Linguistics

View full text Add to dashboard Cite

Formulaic sequences in language use are often studied by means of the automatic identification of frequently recurring series of words, often referred to as 'lexical bundles', in corpora that contrast different registers, academic disciplines etc. As corpora often differ in size, a critically important assumption in this field states that the use of a normalized frequency threshold, such as 20 occurrences per million words, allows for an accurate comparison of corpora of different sizes. Yet, several researchers have argued that normalization may be unreliable when applied to frequency threshold. The study investigates this issue by comparing the number of lexical bundles identified in corpora that differ only in size. Using two complementary random sampling procedures, subcorpora of 100,000 to two million words were extracted from five corpora, with lexical bundles identified in them using two normalized frequency thresholds and two dispersion thresholds. The results show that many more lexical bundles are identified in smaller subcorpora than in larger ones. This size effect can be related to the Zipfian nature of the distribution of words and word sequences in corpora. The conclusion discusses several solutions to avoid the unfairness of comparing lexical bundles identified in corpora of different sizes.

show abstract

Section: Discussionmentioning

confidence: 99%

Comparing Lexical Bundles across Corpora of Different Sizes: The Zipfian Problem

Bestgen

2019

Journal of Quantitative Linguistics

View full text Add to dashboard Cite

show abstract

“…Ces dernières années, le concept a fait son entrée en syntaxe, notamment depuis l'avènement de la Grammaire des Constructions (e.a. Barðdal, 2008;Zeldes, 2012, Perek, 2016. La productivité d'une construction (syntaxique) consiste dans le nombre de lexèmes différents (i.e.…”

Section: La Diversification Au Niveau Des Types Lexicaux 17unclassified

C'est très théâtre, c'est-à-dire très faux.Les origines et le développement de la construction [Adverbe_degré+ Nom]

Lauwers

2018

French Language Studies

View full text Add to dashboard Cite

RÉSUMÉDans cette contribution, nous nous intéressons à l'origine et au développement de la construction [adverbe de degré + nom]: C'est très théâtre, c'est-à-dire très faux. De nos jours, cette construction se présente comme un puissant outil de recatégorisation contextuelle pour exprimer un rapport de ressemblance basé sur une analogie avec un concept nominal. Sur la base d'une recherche de corpus outillée, nous montrons que la construction-hôte [ADVdegré + ADJ] a pu s'ouvrir à la catégorie nominale à partir d'un petit noyau dur de noms humains, en partie qualitatifs, tirant profit de certaines propriétés structurelles du français de l’époque. Par la suite, la construction s'est diversifiée vers d'autres classes sémantiques (noms inanimés, noms propres, etc.) et a progressivement attiré une gamme plus diversifiée de types lexicaux dans son orbite. Elle n'a cessé de gagner en productivité, tout en réduisant son fonds lexical d'origine à la portion congrue.

show abstract

“…The idea behind using such counts to assess productivity can be understood intuitively if we consider that the attested vocabulary size of a certain process corresponds to how productive it has been up until now. Thus a process with more types has a higher 'realized productivity' , in Baayen's terms, than one with fewer types (see also Barðdal 2008;Zeldes 2012). On the other hand, to assess how prone a process is to forming neologisms (regardless of whether it is used often or rarely), we may want to know what the proportion of neologisms is in its output -a process with mostly neologisms is very productive, whereas a repetitive process, with few neologisms, has little 'potential productivity' , no matter how large its realized vocabulary so far.…”

Section: Synthetic Compounds and Productivitymentioning

confidence: 99%

Between VP and NN

Gaeta

Zeldes

2017

Self Cite

View full text Add to dashboard Cite

This paper is concerned with the classification and analysis of different types of German synthetic compounds headed by deverbal agent nouns in -er, such as Romanleser ‘novel-reader’ or Gedankenleser ‘mind-reader’, where the non-head is seen to saturate an argument of the head lexeme while adhering to the semantic interpretation found in corresponding VPs (e.g. the distinct senses of read in the previous examples). In contrast to several previous approaches, which attempt to explain the relationship between VPs and compounds using a unified mechanism of incorporation or derivation, we argue that different compounding patterns require different analyses and that the respective constructions are to some extent independent of each other. While some compounds are modelled after frequent, familiar VPs and take account of the usage profile of syntactic phrases, other productive sets of compounds extend independently lexicalized schemas with fixed compound heads. To support our analysis we undertake the largest empirical survey of these formations to date, using a broad coverage Web corpus. We suggest several categories of verb-object lexeme pairs to account for our data and formulate an analysis of the facts within the framework of Construction Morphology.

show abstract

Productivity in Argument Selection

Cited by 60 publications

References 0 publications

Comparing Lexical Bundles across Corpora of Different Sizes: The Zipfian Problem

Comparing Lexical Bundles across Corpora of Different Sizes: The Zipfian Problem

C'est très théâtre, c'est-à-dire très faux.Les origines et le développement de la construction [Adverbe_degré+ Nom]

Between VP and NN

Contact Info

Product

Resources

About

Productivity in Argument Selection

Cited by 60 publications

References 0 publications

Comparing Lexical Bundles across Corpora of Different Sizes: The Zipfian Problem

Comparing Lexical Bundles across Corpora of Different Sizes: The Zipfian Problem

C'est très théâtre, c'est-à-dire très faux.Les origines et le développement de la construction [Adverbedegré+ Nom]

Between VP and NN

Contact Info

Product

Resources

About

C'est très théâtre, c'est-à-dire très faux.Les origines et le développement de la construction [Adverbe_degré+ Nom]