Miroslav Spousta scite author profile

Miroslav Spousta

4Publications

51Citation Statements Received

20Citation Statements Given

How they've been cited

How they cite others

Affiliations

Charles University

Publications

Order By: Most citations

Semi-supervised training for the averaged perceptron POS tagger

Spoustová

Hajič

Raab

et al. 2009

View full text Add to dashboard Cite

This paper describes POS tagging experiments with semi-supervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standard-sized supervised (manually annotated) dataset (10 6 tokens) combined with a relatively modest (in the order of 10 8 tokens) unsupervised (plain) data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than state-of-the-art results for English and Czech (4.12 % and 4.86 % relative error reduction, respectively; absolute accuracies being 97.44 % and 95.89 %).

show abstract

Dependency Parsing as a Sequence Labeling Task

Spoustová¹,

Spousta²

2010

View full text Add to dashboard Cite

The aim of this paper is to explore the feasibility of solving the dependency parsing problem using sequence labeling tools. We introduce an algorithm to transform a dependency tree into a tag sequence suitable for a sequence labeling algorithm and evaluate several parameter settings on the standard treebank data. We focus mainly on Czech, as a high-inflective freeword-order language, which is not so easy to parse using traditional techniques, but we also test our approach on English for comparison.

show abstract

Towards the automatic extraction of definitions in Slavic

Przepiórkowski

Degórski

Spousta

et al. 2007

View full text Add to dashboard Cite

This paper presents the results of the preliminary experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Bulgarian, Czech and Polish. The extraction is performed by regular grammars over XML-encoded morphosyntacticallyannotated documents. The results are less than satisfying and we claim that the reason for that is the intrinsic difficulty of the task, as measured by the low interannotator agreement, which calls for more sophisticated deeper linguistic processing, as well as for the use of machine learning classification techniques.

show abstract

Integration of Speech and Text Processing Modules into a Real-Time Dialogue System

Ptáček

Ircing

Spousta

et al. 2010

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.