This paper describes the first robust approach to automatically labeling clauses with their situation entity type (Smith, 2003), capturing aspectual phenomena at the clause level which are relevant for interpreting both semantics at the clause level and discourse structure. Previous work on this task used a small data set from a limited domain, and relied mainly on words as features, an approach which is impractical in larger settings. We provide a new corpus of texts from 13 genres (40,000 clauses) annotated with situation entity types. We show that our sequence labeling approach using distributional information in the form of Brown clusters, as well as syntactic-semantic features targeted to the task, is robust across genres, reaching accuracies of up to 76%.
This paper describes a new approach to predicting the aspectual class of verbs in context, i.e., whether a verb is used in a stative or dynamic sense. We identify two challenging cases of this problem: when the verb is unseen in training data, and when the verb is ambiguous for aspectual class. A semi-supervised approach using linguistically-motivated features and a novel set of distributional features based on representative verb types allows us to predict classes accurately, even for unseen verbs. Many frequent verbs can be either stative or dynamic in different contexts, which has not been modeled by previous work; we use contextual features to resolve this ambiguity. In addition, we introduce two new datasets of clauses marked for aspectual class.
Argumentative texts have been thoroughly analyzed for their argumentative structure, and recent efforts aim at their automatic classification. This work investigates linguistic properties of argumentative texts and text passages in terms of their semantic clause types. We annotate argumentative texts with Situation Entity (SE) classes, which combine notions from lexical aspect (states, events) with genericity and habituality of clauses. We analyse the correlation of SE classes with argumentative text genres, components of argument structures, and some functions of those components. Our analysis reveals interesting relations between the distribution of SE types and the argumentative text genre, compared to other genres like fiction or report. We also see tendencies in the correlations between argument components (such as premises and conclusions) and SE types, as well as between argumentative functions (such as support and rebuttal) and SE types. The observed tendencies can be deployed for automatic recognition and fine-grained classification of argumentative text passages.
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 Indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.48%. Continued pretraining offers improvements, with an average accuracy of 43.85%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 49.12%.
Active learning has been shown to be effective for reducing human labeling effort in supervised learning tasks, and in this work we explore its suitability for automatic short answer assessment on the ASAP corpus. We systematically investigate a wide range of AL settings, varying not only the item selection method but also size and selection of seed set items and batch size. Comparing to a random baseline and a recently-proposed diversitybased baseline which uses cluster centroids as training data, we find that uncertainty-based sampling methods can be beneficial, especially for data sets with particular properties. The performance of AL, however, varies considerably across individual prompts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.