Chris Biemann scite author profile

Distributional representations of words have been recently used in supervised settings for recognizing lexical inference relations between word pairs, such as hypernymy and entailment. We investigate a collection of these state-of-the-art methods, and show that they do not actually learn a relation between two words. Instead, they learn an independent property of a single word in the pair: whether that word is a "prototypical hypernym".

show abstract

That's sick dude!: Automatic identification of word sense change across different timescales

Mitra

Riedl

et al. 2014

105

View full text Add to dashboard Cite

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we compare these sense clusters of two different time points to find if (i) there is birth of a new sense or (ii) if an older sense has got split into more than one sense or (iii) if a newer sense has been formed from the joining of older senses or (iv) if a particular sense has died. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.

show abstract

A Report on the Complex Word Identification Shared Task 2018

Yimam¹,

Biemann²,

Malmasi³

et al. 2018

View full text Add to dashboard Cite

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop colocated with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.

show abstract

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Mathew

Saha

Yimam

et al. 2021

AAAI

144

View full text Add to dashboard Cite

Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities. We have made our code and dataset public for other researchers.

show abstract

Making Sense of Word Embeddings

Pelevina¹,

Arefiev²,

Biemann³

et al. 2016

View full text Add to dashboard Cite

We present a simple yet effective approach for learning word sense embeddings. In contrast to existing techniques, which either directly learn sense representations from corpora or rely on sense inventories from lexical resources, our approach can induce a sense inventory from existing word embeddings via clustering of ego-networks of related words. An integrated WSD mechanism enables labeling of words in context with learned sense vectors, which gives rise to downstream applications. Experiments show that the performance of our method is comparable to state-of-the-art unsupervised WSD systems.

show abstract

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

Panchenko

Faralli

Ruppert

et al. 2016

View full text Add to dashboard Cite

We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and Hearst-style lexicosyntactic patterns from domain-specific texts obtained via language model based focused crawling. Extracted taxonomies are evaluated on English, Dutch, French and Italian for three domains each (Food, Environment and Science). Evaluations against a gold standard and by human judgment show that our method outperforms more complex and knowledge-rich approaches on most domains and languages. Furthermore, to adapt the method to a new domain or language, only a small amount of manual labour is needed.

show abstract

Remembering Words in Context as Predicted by an Associative Read-Out Model

Hofmann¹,

Kuchinke²,

Biemann³

et al. 2011

Front. Psychology

View full text Add to dashboard Cite

Interactive activation models (IAMs) simulate orthographic and phonological processes in implicit memory tasks, but they neither account for associative relations between words nor explicit memory performance. To overcome both limitations, we introduce the associative read-out model (AROM), an IAM extended by an associative layer implementing long-term associations between words. According to Hebbian learning, two words were defined as “associated” if they co-occurred significantly often in the sentences of a large corpus. In a study-test task, a greater amount of associated items in the stimulus set increased the “yes” response rates of non-learned and learned words. To model test-phase performance, the associative layer is initialized with greater activation for learned than for non-learned items. Because IAMs scale inhibitory activation changes by the initial activation, learned items gain a greater signal variability than non-learned items, irrespective of the choice of the free parameters. This explains why the slope of the z-transformed receiver-operating characteristics (z-ROCs) is lower one during recognition memory. When fitting the model to the empirical z-ROCs, it likewise predicted which word is recognized with which probability at the item-level. Since many of the strongest associates reflect semantic relations to the presented word (e.g., synonymy), the AROM merges form-based aspects of meaning representation with meaning relations between words.

show abstract

Text: now in 2D! A framework for lexical expansion with contextual similarity

Biemann

Riedl

2013

JLM

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chris Biemann

Do Supervised Distributional Methods Really Learn Lexical Inference Relations?

That's sick dude!: Automatic identification of word sense change across different timescales

A Report on the Complex Word Identification Shared Task 2018

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Making Sense of Word Embeddings

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

Remembering Words in Context as Predicted by an Associative Read-Out Model

Text: now in 2D! A framework for lexical expansion with contextual similarity

Contact Info

Product

Resources

About