This paper presents the results of an annotation study focused on the fine-grained analysis of argumentation structures in scientific publications. Our new annotation scheme specifies four types of binary argumentative relations between sentences, resulting in the representation of arguments as small graph structures. We developed an annotation tool that supports the annotation of such graphs and carried out an annotation study with four annotators on 24 scientific articles from the domain of educational research. For calculating the inter-annotator agreement, we adapted existing measures and developed a novel graphbased agreement measure which reflects the semantic similarity of different annotation graphs.
This paper presents a study on the role of discourse markers in argumentative discourse. We annotated a German corpus with arguments according to the common claim-premise model of argumentation and performed various statistical analyses regarding the discriminative nature of discourse markers for claims and premises. Our experiments show that particular semantic groups of discourse markers are indicative of either claims or premises and constitute highly predictive features for discriminating between them.
Previous models for the assessment of commitment towards a predicate in a sentence (also known as factuality prediction) were trained and tested against a specific annotated dataset, subsequently limiting the generality of their results. In this work we propose an intuitive method for mapping three previously annotated corpora onto a single factuality scale, thereby enabling models to be tested across these corpora. In addition, we design a novel model for factuality prediction by first extending a previous rule-based factuality prediction system and applying it over an abstraction of dependency trees, and then using the output of this system in a supervised classifier. We show that this model outperforms previous methods on all three datasets. We make both the unified factuality corpus and our new model publicly available.
We introduce lemonUby, a new lexical resource integrated in the Semantic Web which is the result of converting data extracted from the existing large-scale linked lexical resource UBY to the lemon lexicon model. The following data from UBY were converted: WordNet, FrameNet, VerbNet, English and German Wiktionary, the English and German entries of Omega-Wiki, as well as links between pairs of these lexicons at the word sense level (links between
Synthesis Lectures on Human Language Technologies is edited by Graeme Hirst of the University of Toronto. e series consists of 50-to 150-page monographs on topics relating to natural language processing, computational linguistics, information retrieval, and spoken language understanding. Emphasis is on important new techniques, on new applications, and on topics that combine two or more HLT subfields.
We present a new supervised framework that learns to estimate automatic Pyramid scores and uses them for optimizationbased extractive multi-document summarization. For learning automatic Pyramid scores, we developed a method for automatic training data generation which is based on a genetic algorithm using automatic Pyramid as the fitness function. Our experimental evaluation shows that our new framework significantly outperforms strong baselines regarding automatic Pyramid, and that there is much room for improvement in comparison with the upperbound for automatic Pyramid.
The Story Cloze test (Mostafazadeh et al., 2016) is a recent effort in providing a common test scenario for text understanding systems. As part of the LSDSem 2017 shared task, we present a system based on a deep learning architecture combined with a rich set of manually-crafted linguistic features. The system outperforms all known baselines for the task, suggesting that the chosen approach is promising. We additionally present two methods for generating further training data based on stories from the ROCStories corpus. Our system and generated data are publicly available on GitHub 1 .
We investigate the automatic assignment of research methods to journal articles from the domain of Social Sciences. We employ Computer Science and Computational Linguistics methodology to perform this automatic assignment of metadata. The multi‐label classification system we present uses only abstracts and titles of journal articles as input. Our best system is able to assign the important research methods empiricaland quantitative empirical with F‐scores of 0.67 and 0.68. These research methods are in the focus of many recent manual analyses of publications databases. Our classification approach could be applied to automatically analyze large publications databases and databases of bibliographic references according to the use of empirical and quantitative empirical methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.