Jiří Mírovský scite author profile

Jiří Mírovský

5Publications

86Citation Statements Received

24Citation Statements Given

How they've been cited

107

How they cite others

Affiliations

Charles University

Publications

Order By: Most citations

Prague Dependency Treebank

Hajič

Hajičová

Mikulová

et al. 2017

View full text Add to dashboard Cite

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is -as it always been the case for the family of the Prague Dependency Treebanks -to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research. PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme (albeit not everything is annotated manually, as we describe in detail here). The texts come from different sources: daily newspaper articles, Czech translation of the Wall Street Journal, transcribed dialogs and a small amount of user-generated, short, often non-standard language segments typed into a web translator. Altogether, the treebank contains around 180,000 sentences with their morphological, surface and deep syntactic annotation. The diversity of the texts and annotations should serve well the NLP applications as well as it is an invaluable resource for linguistic research, including comparative studies regarding texts of different genres. The corpus is publicly and freely available.

show abstract

The coding scheme for annotating extended nominal coreference and bridging anaphora in the Prague Dependency Treebank

Nedoluzhko

Mírovský

Pajas

2009

View full text Add to dashboard Cite

The present paper outlines an ongoing project of annotation of the extended nominal coreference and the bridging anaphora in the Prague Dependency Treebank. We describe the annotation scheme with respect to the linguistic classification of coreferential and bridging relations and focus also on details of the annotation process from the technical point of view. We present methods of helping the annotators -by a pre-annotation and by several useful features implemented in the annotation tool. Our method of the inter-annotator agreement is focused on the improvement of the annotation guidelines; we present results of three subsequent measurements of the agreement.

show abstract

Designing a language game for collecting coreference annotation

Hladká

Mírovský

Schlesinger

2009

View full text Add to dashboard Cite

show abstract

CzeDLex – A Lexicon of Czech Discourse Connectives

Mírovský¹,

Synková²,

Rysová³

et al. 2017

View full text Add to dashboard Cite

CzeDLex is a new electronic lexicon of Czech discourse connectives, planned for publication by the end of this year. Its data format and structure are based on a study of similar existing resources, and adjusted to comply with the Czech syntactic tradition and specifics and with the Prague approach to the annotation of semantic discourse relations in text.In the article, we first put the lexicon in context of related resources and discuss theoretical aspects of building the lexicon -we present arguments for our choice of the data structure and for selecting features of the lexicon entries, while special attention is paid to a consistent and (as far as possible) uniform encoding of both primary (such as in English because, therefore) and secondary connectives (e.g. for this reason, this is the reason why). The main principle adopted for nesting entries in the lexicon is -apart from the lexical form of the connective -a discoursesemantic type (sense) expressed by the given connective, which enables us to deal with a broad formal variability of connectives and is convenient for interlinking CzeDLex with lexicons in other languages.Second, we introduce the chosen technical solution based on the Prague Markup Language, which allows for an efficient incorporation of the lexicon into the family of Prague treebanksit can be directly opened and edited in the tree editor TrEd, processed from the command line in btred, interlinked with its source corpus and queried in the PML Tree Query engine.Third, we describe the process of getting data for the lexicon by exploiting a large corpus manually annotated with discourse relations -the Prague Discourse Treebank 2.0: we elaborate on the automatic extraction part, post-extraction checks and manual addition of supplementary linguistic information.

show abstract

Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech

Novák

Mírovský

Rysová

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.