In this paper, we present the final version of a publicly available treebank of Finnish, the Turku Dependency Treebank. The treebank contains 204,399 tokens (15,126 sentences) from 10 different text sources and has been manually annotated in a Finnishspecific version of the well-known Stanford Dependency scheme. The morphological analyses of the treebank have been assigned using a novel machine learning method to disambiguate readings given by an existing tool. As the second main contribution, we present the first open source Finnish dependency parser, trained on the newly introduced treebank. The parser achieves a labeled attachment score of 81 %. The treebank data as well as the parsing pipeline are available under an open license at
Several incompatible syntactic annotation schemes are currently used by parsers and corpora in biomedical information extraction. The recently introduced Stanford dependency scheme has been suggested to be a suitable unifying syntax formalism. In this paper, we present a step towards such unification by creating a conversion from the Link Grammar to the Stanford scheme. Further, we create a version of the BioInfer corpus with syntactic annotation in this scheme. We present an application-oriented evaluation of the transformation and assess the suitability of the scheme and our conversion to the unification of the syntactic annotations of BioInfer and the GENIA Treebank. We find that a highly reliable conversion is both feasible to create and practical, increasing the applicability of both the parser and the corpus to information extraction.
We present the Finnish PropBank, a resource for semantic role labeling (SRL) of Finnish based on the Turku Dependency Treebank whose syntax is annotated in the well-known Stanford Dependency (SD) scheme. The contribution of this paper consists of the lexicon of the verbs and their arguments present in the treebank, as well as the predicate-argument annotation of all verb occurrences in the treebank text. We demonstrate that the annotation is of high quality, that the SD scheme is highly compatible with PropBank annotation, and further that the additional dependencies present in the Turku Dependency Treebank are clearly beneficial for PropBank annotation. Further, we also use the PropBank to provide a strong baseline for automated Finnish & Filip Ginter
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.