A Twitter Corpus and Benchmark Resources for German Sentiment Analysis

Cieliebak, Mark; Deriu, Jan; Egger, Dominic; Uzdilli, Fatih

doi:10.18653/v1/w17-1106

Cited by 53 publications

(31 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the computational complexity of identifying particle-verb combinations when the particle appears at a distance, it is highly likely that for split particle verbs, the base verb of the verb-particle combination is processed as if it were a simple verb (e.g., werfe, wirfst, wirft, werfen, and werft, 1st, 2nd, and 3rd person singular and plural present, respectively). As a consequence, the semantic similarity of simple verbs and particle verbs computed from the word embeddings provided by Cieliebak et al (2017) and Deriu et al (2017) is in all likelihood larger than it should be. Not all words in the experiment are in this database; but for six words, we were able to replace the infinitive by a related form (einpassen → reinpassen, verqualmen→ verqualmt, fortlaufen → fortlaufend, bestürzen → bestürzend, verfinstern → verfinstert, beschneien → beschneites).…”

Section: Semantic Vectors From Tweetsmentioning

confidence: 90%

“…As LDL-based semantic vectors for German are currently under construction, we fell back on the word embeddings (semantic vectors) provided at http://www.spinningbytes.com/resources/ wordembeddings/ (Cieliebak et al, 2017;Deriu et al, 2017). These embeddings (obtained with word2vec, Mikolov et al, 2013) are 300-dimensional vectors derived from a 50 million word corpus of German tweets.…”

Section: Semantic Vectors From Tweetsmentioning

confidence: 99%

“…Tweets from facebook have been shown to outperform frequencies from standard text corpora in predicting lexical decision latencies (Herdagdelen and Marelli, 2017). Cieliebak et al (2017) and Deriu et al (2017) provide separate semantic vectors for words' inflected variants. For instance, the particle verb vorwerfen ("accuse") occurs in their database in the forms vorwerfen (infinitive and 1st or 3rd person plural present), vorwerfe (1st person singular present), vorwirfst (2nd person singular present), vorwirft (3rd person singular present), vorwerft (2nd person plural present), vorgeworfen (past participle), and vorzuwerfen (infinitive construction with zu).…”

Section: Semantic Vectors From Tweetsmentioning

confidence: 99%

See 2 more Smart Citations

Modeling Morphological Priming in German With Naive Discriminative Learning

Baayen

Smolka

2020

Front. Commun.

View full text Add to dashboard Cite

Both localist and connectionist models, based on experimental results obtained for English and French, assume that the degree of semantic compositionality of a morphologically complex word is reflected in how it is processed. Since priming experiments using English and French morphologically related prime-target pairs reveal stronger priming when complex words are semantically transparent (e.g., refill-fill) compared to semantically more opaque pairs (e.g., restrain-strain), localist models set up connections between complex words and their stems only for semantically transparent pairs. Connectionist models have argued that the effect of transparency should arise as an epiphenomenon in PDP networks. However, for German, a series of studies has revealed equivalent priming for both transparent and opaque prime-target pairs, which suggests mediation of lexical access by the stem, independent of degrees of semantic compositionality. This study reports a priming experiment that replicates equivalent priming for transparent and opaque pairs. We show that these behavioral results can be straightforwardly modeled by a computational implementation of Word and Paradigm Morphology (WPM), Naive Discriminative Learning (NDL). Just as WPM, NDL eschews the theoretical construct of the morpheme. NDL succeeds in modeling the German priming data by inspecting the extent to which a discrimination network pre-activates the target lexome from the orthographic properties of the prime. Measures derived from an NDL network, complemented with a semantic similarity measure derived from distributional semantics, predict lexical decision latencies with somewhat improved precision compared to classical measures, such as word frequency, prime type, and human association ratings. We discuss both the methodological implications of our results, as well as their implications for models of the mental lexicon.

show abstract

Section: Semantic Vectors From Tweetsmentioning

confidence: 90%

Section: Semantic Vectors From Tweetsmentioning

confidence: 99%

Section: Semantic Vectors From Tweetsmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Morphological Priming in German With Naive Discriminative Learning

Baayen

Smolka

2020

Front. Commun.

View full text Add to dashboard Cite

show abstract

“…To test whether the above results are specific to the NDL-based semantic vectors that we used, a separate analysis was carried out by using a different algorithm for constructing semantic vectors, applied to a different language register. We downloaded the word embeddings from https:// www.spinningbytes.com/resources/wordembeddings/ (Cieliebak et al, 2017;Deriu et al, 2017). These embeddings are 200-dimension vectors, which were trained with Word2Vec on 200 million A summary of the GAM fitted to the acoustic durations is provided in Table 9.…”

Section: Appendix Ldl With Tweet-based Word2vec Embeddingsmentioning

confidence: 99%

The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using Linear Discriminative Learning

Chuang¹,

Voller²,

Shafaei-Bajestan³

et al. 2020

Preprint

View full text Add to dashboard Cite

Pseudowords have long served as key tools in psycholinguistic investigations of the lexicon. A common assumption underlying the use of pseudowords is that they are devoid of meaning: Comparing words and pseudowords may then shed light on how meaningful linguistic elements are processed differently from meaningless sound strings.However, pseudowords may in fact carry meaning. On the basis of a computational model of lexical processing, Linear Discriminative Learning (LDL Baayen et al., 2019), we compute numeric vectors representing the semantics of pseudowords. We demonstrate that quantitative measures gauging the semantic neighborhoods of pseudowords predict reaction times in the Massive Auditory Lexical Decision (MALD) database (Tucker et al., 2018). We also show that the model successfully predicts the acoustic durations of pseudowords. Importantly, model predictions hinge on the hypothesis that the mechanisms underlying speech production and comprehension interact. Thus, pseudowords emerge as an outstanding tool for gauging the resonance between production and comprehension.Many pseudowords in the MALD database contain inflectional suffixes. Unlike many contemporary models, LDL captures the semantic commonalities of forms sharing inflectional exponents without using the linguistic construct of morphemes. We discuss methodological and theoretical implications for models of lexical processing and morphological theory. The results of this study, complementing those on real words reported in Baayen et al. (2019), thus provide further evidence for the usefulness of LDL both as a cognitive model of the mental lexicon, and as a tool for generating new quantitative measures that are predictive for human lexical processing.

show abstract

“…We built 100-dimensional word embeddings from CODE ALLTAG XL (Krieg-Holz et al, 2016) using WORD2VEC (Mikolov et al, 2013) for all words occurring at least 3 times in CODE ALLTAG XL . Furthermore, we employed WORD2VEC word embeddings from Reimers et al (2014) with a minimum word frequency of 5 and 100 dimensions (UKP), 300-dimensional FASTTEXT word embeddings from SPINNING-BYTES (Cieliebak et al, 2017) trained on German tweets (TWITTER) and, finally, FASTTEXT word embeddings (Grave et al, 2018) based on COM-MON CRAWL and WIKIPEDIA (FASTTEXT). We also tried to utilize embeddings generated from the German TWITTER HATESPEECH corpora from Ross et al (2016) and Wiegand et al (2018b) under the assumption that they might contain a large number of rough and vulgar words.…”

Section: Regression Modelsmentioning

confidence: 99%

At the Lower End of Language—Exploring the Vulgar and Obscene Side of German

Eder¹,

Krieg-Holz²,

Hahn³

2019

Proceedings of the Third Workshop on Abusive Language Online

View full text Add to dashboard Cite

In this paper, we describe a workflow for the data-driven acquisition and semantic scaling of a lexicon that covers lexical items from the lower end of the German language registerterms typically considered as rough, vulgar or obscene. Since the fine semantic representation of grades of obscenity can only inadequately be captured at the categorical level (e.g., obscene vs. non-obscene, or rough vs. vulgar), our main contribution lies in applying best-worst scaling, a rating methodology that has already been shown to be useful for emotional language, to capture the relative strength of obscenity of lexical items. We describe the empirical foundations for bootstrapping such a low-end lexicon for German by starting from manually supplied lexicographic categorizations of a small seed set of rough and vulgar lexical items and automatically enlarging this set by means of distributional semantics. We then determine the degrees of obscenity for the full set of all acquired lexical items by letting crowdworkers comparatively assess their pejorative grade using best-worst scaling. This semi-automatically enriched lexicon already comprises 3,300 lexical items and incorporates 33,000 vulgarity ratings. Using it as a seed lexicon for fully automatic lexical acquisition, we were able to raise its coverage up to slightly more than 11,000 entries.

show abstract

A Twitter Corpus and Benchmark Resources for German Sentiment Analysis

Cited by 53 publications

References 19 publications

Modeling Morphological Priming in German With Naive Discriminative Learning

Modeling Morphological Priming in German With Naive Discriminative Learning

The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using Linear Discriminative Learning

At the Lower End of Language—Exploring the Vulgar and Obscene Side of German

Contact Info

Product

Resources

About