Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as "words with spaces". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-millionword annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
We propose a framework for dependency parsing based on a combination of discriminative and generative models. We use a discriminative model to obtain a kbest list of candidate parses, and subsequently rerank those candidates using a generative model. We show how this approach allows us to evaluate a variety of generative models, without needing different parser implementations. Moreover, we present empirical results that show a small improvement over state-of-the-art dependency parsing of English sentences.
The social brain hypothesis proposes that enlarged brains have evolved in response to the increasing cognitive demands that complex social life in larger groups places on primates and other mammals. However, this reasoning can be challenged by evidence that brain size has decreased in the evolutionary transitions from solitary to social larger groups in the case of Neolithic humans and some eusocial insects. Different hypotheses can be identified in the literature to explain this reduction in brain size. We evaluate some of them from the perspective of recent approaches to cognitive science, which support the idea that the basis of cognition can span over brain, body, and environment. Here we show through a minimal cognitive model using an evolutionary robotics methodology that the neural complexity, in terms of neural entropy and degrees of freedom of neural activity, of smaller-brained agents evolved in social interaction is comparable to the neural complexity of larger-brained agents evolved in solitary conditions. The nonlinear time series analysis of agents' neural activity reveals that the decoupled smaller neural network is intrinsically lower dimensional than the decoupled larger neural network. However, when smaller-brained agents are interacting, their actual neural complexity goes beyond its intrinsic limits achieving results comparable to those obtained by larger-brained solitary agents. This suggests that the smaller-brained agents are able to enhance their neural complexity through social interaction, thereby offsetting the reduced brain size.
In this paper we describe DialettiBot, a Telegram based chatbot for crowdsourcing geo-referenced voice recordings of Italian dialects. The system enables people to listen to previously recorded audio and encourages them to contribute to building a collective linguistic resource by sending voice recordings of their own spoken dialects. The project aims at collecting a large sample of voice recordings in order to promote knowledge of linguistic variation and preserve proverbs or idioms typical for different local dialects. Moreover, the collected data can contribute to several voice-based Natural Language Processing (NLP) applications in helping them understand utterances in non-standard Italian.
In this paper, we present the first incremental parser for Tree Substitution Grammar (TSG). A TSG allows arbitrarily large syntactic fragments to be combined into complete trees; we show how constraints (including lexicalization) can be imposed on the shape of the TSG fragments to enable incremental processing. We propose an efficient Earley-based algorithm for incremental TSG parsing and report an F-score competitive with other incremental parsers. In addition to whole-sentence F-score, we also evaluate the partial trees that the parser constructs for sentence prefixes; partial trees play an important role in incremental interpretation, language modeling, and psycholinguistics. Unlike existing parsers, our incremental TSG parser can generate partial trees that include predictions about the upcoming words in a sentence. We show that it outperforms an n-gram model in predicting more than one upcoming word.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.