We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.Our dataset and parser can be found at
We discuss the phenomenon of closest conjunct agreement with a special focus on head-final languages. We present data from two such languages, Hindi and Tsez, which allow agreement with the rightmost conjunct. This contrasts with head-initial languages, such as Arabic, where close conjunct agreement is with the leftmost conjunct in clauses with VS order. This asymmetry raises a number of questions that we will discuss. First, is the typological difference between head-initial and head-final languages in the context of coordination due to a difference in the
We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that an adposition's lexical contribution is equivalent to the role/relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition's lexical function so they can be annotated at scale-supporting automatic, statistical processing of domaingeneral language-and discuss how this representation would allow for a simpler inventory of labels.
We describe the CMU systems submitted to the 2014 WMT shared translation task. We participated in two language pairs, German-English and Hindi-English. Our innovations include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create "synthetic translation options" that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.
Does language dominance modulate knowledge of case marking in Hindi-speaking bilinguals? Hindi is a split ergative language with a rich morphological case system. Subjects of transitive perfective predicates are marked with ergative case (-ne). Human specific direct objects, indirect objects, and dative subjects are marked with the particle -ko. We compared knowledge of case marking in Hindi–English bilinguals with different dominance patterns: 23 balanced bilinguals and two groups of bilinguals with Hindi as their weaker language: 24 L2 learners of Hindi with age of acquisition (AoA) of Hindi in adulthood and 26 Hindi heritage speakers with AoA of Hindi since birth in oral production and acceptability judgments. The balanced bilinguals outperformed the English-dominant bilinguals; the L2 learners and the heritage speakers, who showed similar lower command of the Hindi case marking system, with the exception of -ko marking as a function of specificity with direct objects. We consider how dominant language transfer, AoA of Hindi, and input factors may explain the acquisition and knowledge of morphology in Hindi as the weaker language.
This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses;, an inventory of 50 semantic labels ("supersenses") that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE 4.1 corpus (https://github. com/nert-gu/streusle/). Though the SNACS inventory aspires to be universal, this document is specific to English; documentation for other languages will be published separately.Version 2 is a revision of the supersense inventory proposed for English by Schneider et al. ( , 2016 (henceforth "v1"), which in turn was based on previous schemes. The present inventory was developed after extensive review of the v1 corpus annotations for English, plus previously unanalyzed genitive case possessives (Blodgett and Schneider, 2018), as well as consideration of adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et al. (2017) present the theoretical underpinnings of the v2 scheme. summarize the scheme, its application to English corpus data, and an automatic disambiguation task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.