Gijs Wijnholds scite author profile

This paper is about learning word representations using grammatical type information. We use the syntactic types of Combinatory Categorial Grammar to develop multilinear representations, i.e. maps with n arguments, for words with different functional types. The multilinear maps of words compose with each other to form sentence representations. We extend the skipgram algorithm from vectors to multilinear maps to learn these representations and instantiate it on unary and binary maps for transitive verbs. These are evaluated on verb and sentence similarity and disambiguation tasks and a subset of the SICK relatedness dataset. Our model performs better than previous typedriven models and is competitive with state of the art representation learning methods such as BERT and neural sentence encoders.

show abstract

Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality (Extended Abstract)

McPheat

Sadrzadeh

Wazni

et al. 2021

Electron. Proc. Theor. Comput. Sci.

View full text Add to dashboard Cite

We develop a categorical compositional distributional semantics for Lambek Calculus with a Relevant Modality, !L * , which has a limited version of the contraction and permutation rules. The categorical part of the semantics is a monoidal biclosed category with a coalgebra modality as defined on Differential Categories. We instantiate this category to finite dimensional vector spaces and linear maps via "quantisation" functors and work with three concrete interpretations of the coalgebra modality. We apply the model to construct categorical and concrete semantic interpretations for the motivating example of !L * : the derivation of a phrase with a parasitic gap. The effectiveness of the concrete interpretations are evaluated via a disambiguation task, on an extension of a sentence disambiguation dataset to parasitic gap phrases, using BERT, Word2Vec, and FastText vectors and Relational tensors.

show abstract

Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings

Wijnholds

Sadrzadeh

2019

View full text Add to dashboard Cite

Ellipsis is a natural language phenomenon where part of a sentence is missing and its information must be recovered from its surrounding context, as in "Cats chase dogs and so do foxes.". Formal semantics has different methods for resolving ellipsis and recovering the missing information, but the problem has not been considered for distributional semantics, where words have vector embeddings and combinations thereof provide embeddings for sentences. In elliptical sentences these combinations go beyond linear as copying of elided information is necessary. In this paper, we develop different models for embedding VP-elliptical sentences. We extend existing verb disambiguation and sentence similarity datasets to ones containing elliptical phrases and evaluate our models on these datasets for a variety of non-linear combinations and their linear counterparts. We compare results of these compositional models to state of the art holistic sentence encoders. Our results show that non-linear addition and a non-linear tensor-based composition outperform the naive non-compositional baselines and the linear models, and that sentence encoders perform well on sentence similarity, but not on verb disambiguation.

show abstract

Classical Copying versus Quantum Entanglement in Natural Language: The Case of VP-ellipsis

Wijnholds

Sadrzadeh

2018

Electron. Proc. Theor. Comput. Sci.

View full text Add to dashboard Cite

This paper compares classical copying and quantum entanglement in natural language by considering the case of verb phrase (VP) ellipsis. VP ellipsis is a non-linear linguistic phenomenon that requires the reuse of resources, making it the ideal test case for a comparative study of different copying behaviours in compositional models of natural language. Following the line of research in compositional distributional semantics set out by [6] we develop an extension of the Lambek calculus which admits a controlled form of contraction to deal with the copying of linguistic resources. We then develop two different compositional models of distributional meaning for this calculus. In the first model, we follow the categorical approach of [5] in which a functorial passage sends the proofs of the grammar to linear maps on vector spaces and we use Frobenius algebras to allow for copying. In the second case, we follow the more traditional approach that one finds in categorial grammars, whereby an intermediate step interprets proofs as non-linear lambda terms, using multiple variable occurrences that model classical copying. As a case study, we apply the models to derive different readings of ambiguous elliptical phrases and compare the analyses that each model provides.

show abstract

SICK-NL: A Dataset for Dutch Natural Language Inference

Wijnholds¹,

Moortgat²

2021

View full text Add to dashboard Cite

We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch. SICK-NL is obtained by translating the SICK dataset of Marelli et al. (2014) from English into Dutch. Having a parallel inference dataset allows us to compare both monolingual and multilingual NLP models for English and Dutch on the two tasks. In the paper, we motivate and detail the translation process, perform a baseline evaluation on both the original SICK dataset and its Dutch incarnation SICK-NL, taking inspiration from Dutch skipgram embeddings and contextualised embedding models. In addition, we encapsulate two phenomena encountered in the translation to formulate stress tests and verify how well the Dutch models capture syntactic restructurings that do not affect semantics. Our main finding is all models perform worse on SICK-NL than on SICK, indicating that the Dutch dataset is more challenging than the English original. Results on the stress tests show that models don't fully capture word order freedom in Dutch, warranting future systematic studies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gijs Wijnholds

Representation Learning for Type-Driven Composition

Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality (Extended Abstract)

Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings

Classical Copying versus Quantum Entanglement in Natural Language: The Case of VP-ellipsis

SICK-NL: A Dataset for Dutch Natural Language Inference

Contact Info

Product

Resources

About