Archna Bhatia scite author profile

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.Our dataset and parser can be found at

show abstract

Closest conjunct agreement in head final languages

Benmamoun

Bhatia

Polinsky

2009

LIVY

View full text Add to dashboard Cite

We discuss the phenomenon of closest conjunct agreement with a special focus on head-final languages. We present data from two such languages, Hindi and Tsez, which allow agreement with the rightmost conjunct. This contrasts with head-initial languages, such as Arabic, where close conjunct agreement is with the leftmost conjunct in clauses with VS order. This asymmetry raises a number of questions that we will discuss. First, is the typological difference between head-initial and head-final languages in the context of coordination due to a difference in the

show abstract

Erosion of case and agreement in Hindi heritage speakers

Montrul

Bhatt²,

Bhatia

2012

LAB

View full text Add to dashboard Cite

Recent research has identified several vulnerable areas in heritage language grammars, among which morphosyntax is among the most affected. In this study, we report on the morphosyntactic competence of Hindi heritage speakers living in the U.S and show that these speakers have representational problems with ergative, accusative and dative case morphology, albeit to different degrees. Hindi is a split ergative language with a complex interaction of case and agreement. Transitive predicates in perfective aspect co-occur with subjects marked with ergative case (-ne) and object agreement. Animate specific direct objects are marked with the particle -ko, and so are the indirect objects and dative subjects. 21 Hindi native speakers and 28 Hindi heritage speakers completed a sociolinguistic questionnaire, a Hindi cloze test, an oral narrative task and a bimodal acceptability judgment task. The results showed significant differences between the fluent native speakers and the heritage speakers on all measures.

show abstract

Double Trouble: The Problem of Construal in Semantic Annotation of Adpositions

Hwang¹,

Bhatia²,

Han³

et al. 2017

View full text Add to dashboard Cite

We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that an adposition's lexical contribution is equivalent to the role/relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition's lexical function so they can be annotated at scale-supporting automatic, statistical processing of domaingeneral language-and discuss how this representation would allow for a simpler inventory of labels.

show abstract

The CMU Machine Translation Systems at WMT 2014

Matthews¹,

Ammar²,

Bhatia³

et al. 2014

View full text Add to dashboard Cite

We describe the CMU systems submitted to the 2014 WMT shared translation task. We participated in two language pairs, German-English and Hindi-English. Our innovations include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create "synthetic translation options" that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.

show abstract

Case Marking in Hindi as the Weaker Language

et al. 2019

View full text Add to dashboard Cite

Does language dominance modulate knowledge of case marking in Hindi-speaking bilinguals? Hindi is a split ergative language with a rich morphological case system. Subjects of transitive perfective predicates are marked with ergative case (-ne). Human specific direct objects, indirect objects, and dative subjects are marked with the particle -ko. We compared knowledge of case marking in Hindi–English bilinguals with different dominance patterns: 23 balanced bilinguals and two groups of bilinguals with Hindi as their weaker language: 24 L2 learners of Hindi with age of acquisition (AoA) of Hindi in adulthood and 26 Hindi heritage speakers with AoA of Hindi since birth in oral production and acceptability judgments. The balanced bilinguals outperformed the English-dominant bilinguals; the L2 learners and the heritage speakers, who showed similar lower command of the Hindi case marking system, with the exception of -ko marking as a function of specificity with direct objects. We consider how dominant language transfer, AoA of Hindi, and input factors may explain the acquisition and knowledge of morphology in Hindi as the weaker language.

show abstract

Adposition and Case Supersenses v2.6: Guidelines for English

Schneider¹,

Hwang²,

Srikumar³

et al. 2017

Preprint

View full text Add to dashboard Cite

This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses;, an inventory of 50 semantic labels ("supersenses") that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE 4.1 corpus (https://github. com/nert-gu/streusle/). Though the SNACS inventory aspires to be universal, this document is specific to English; documentation for other languages will be published separately.Version 2 is a revision of the supersense inventory proposed for English by Schneider et al. ( , 2016 (henceforth "v1"), which in turn was based on previous schemes. The present inventory was developed after extensive review of the v1 corpus annotations for English, plus previously unanalyzed genitive case possessives (Blodgett and Schneider, 2018), as well as consideration of adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et al. (2017) present the theoretical underpinnings of the v2 scheme. summarize the scheme, its application to English corpus data, and an automatic disambiguation task.

show abstract

Chapter 10. Comprehension of Differential Object Marking by Hindi heritage speakers

Bhatia

Montrul

2020

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Archna Bhatia

A Dependency Parser for Tweets

Closest conjunct agreement in head final languages

Erosion of case and agreement in Hindi heritage speakers

Double Trouble: The Problem of Construal in Semantic Annotation of Adpositions

The CMU Machine Translation Systems at WMT 2014

Case Marking in Hindi as the Weaker Language

Adposition and Case Supersenses v2.6: Guidelines for English

Chapter 10. Comprehension of Differential Object Marking by Hindi heritage speakers

Contact Info

Product

Resources

About