Verb-noun combinations (VNCs)-e.g., blow the whistle, hit the roof, and see stars-are a common type of English idiom that are ambiguous with literal usages. In this paper we propose and evaluate models for classifying VNC usages as idiomatic or literal, based on a variety of approaches to forming distributed representations. Our results show that a model based on averaging word embeddings performs on par with, or better than, a previously-proposed approach based on skip-thoughts. Idiomatic usages of VNCs are known to exhibit lexico-syntactic fixedness. We further incorporate this information into our models, demonstrating that this rich linguistic knowledge is complementary to the information carried by distributed representations.
Kurzfassung EUROTRA ist ein AÜ-System, das gegenwärtig unter der Schirmherrschaft der Kommission der Europäischen Gemeinschaften geplant wird. Schon bei der Konzeption wurden bestimmte Zielvorstellungen formuliert, denen jedes europäische System entsprechen muss. Diese Ziele bilden eine Reihe von Kriterien, die ständig bei Entscheidungen über die Systemkonzeption herangezogen werden. Im vorliegenden Referat werden nacheinander alle diese Ziele behandelt, es wird versucht die Auswirkungen der Zielvorstellungen auf die Gesamtkonzeption des Systems zu beschreiben. Da EUROTRA sich den gegenwärtigen Stand der Wissenschaft in automatischer Uebersetzung soweit wie möglich zunutze machen soll, wird ausserdem versucht, es in seinen Kontext einzugliedern und hierzu die Merkmale anderer Systeme an denselben... Kriterien zu messen. AbstractEUROTRA is a machine translation system currently being planned under the auspices of the Commission of the European Communities. From its conception, certain objectives were set up which any European system must meet. These objectives constitute a set of criteria constantly used in making design decisions. This paper takes each of these objectives in turn and attempts to describe its consequences on the overall design of the system.Since EUROTRA is intended to profit as much as possible from the current state of the art in machine translation, an attempt is also made to put it into context by referring to the characteristics of other systems with respect to the same criteria. ResumeEUROTRA est un Systeme de traduction automatique actuellement etude sous le patronage de la Commission des Communautes europeennes. Des sä conception ont ete fixes certains objectifs que tout Systeme europoen se doit d'atteindre. Ces objectifs constituent un ensemble de criteres constamment utilises pour les decisions au stade de la conception. L'ouvrage examine chacun de ces objectifs l'un apres l'autre et s'efforce d'en d6crire les incidences sur la conception gen orale du systdme.Puisqu'EUROTRA se propose de tirer le meilleur parti possible des connaissances actuelles en matiere de traduction automatique, une tentative est egalement faite pour le situer dans le contexte par reference aux caracteristiques des autres systemes en fonction des memes criteres.Compendio EUROTRA e un sistema di traduzione automatica attualmente in corso di elaborazione per iniziativa sorta sotto gli auspici della Commissione delle Comunitä europee. Sin dalla sua concezione sono stati fissati alcuni obiettivi comuni a ogni sistema europeo. Questi obiettivi costituiscono un complesso di criteri
Usage similarity (USim) is an approach to determining word meaning in context that does not rely on a sense inventory. Instead, pairs of usages of a target lemma are rated on a scale. In this paper we propose unsupervised approaches to USim based on embeddings for words, contexts, and sentences, and achieve state-of-the-art results over two USim datasets. We further consider supervised approaches to USim, and find that although they outperform unsupervised approaches, they are unable to generalize to lemmas that are unseen in the training data.
In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word cooccurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.