2017
DOI: 10.31286/jp.97.1.7
|View full text |Cite
|
Sign up to set email alerts
|

Morfeusz 2 – analizator i generator fleksyjny dla języka polskiego

Abstract: Morfeusz 2 -analizator i generator fleksyjny dla języka polskiego1Słowa kluczowe: analiza i synteza morfologiczna, fleksja, przetwarzanie języka naturalnego.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 3 publications
0
1
0
Order By: Relevance
“…Next, texts were tokenized, which resulted in obtaining 141,555 unique tokens. All were reduced to their original form with a lemmatization operation performed with the Morfeusz 2 program (Kieraś & Woliński, 2017) which, to the best of my knowledge, currently allows the most accurate lemmatization of the Polish language. After lemmatization, 52,628 unique tokens remained.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Next, texts were tokenized, which resulted in obtaining 141,555 unique tokens. All were reduced to their original form with a lemmatization operation performed with the Morfeusz 2 program (Kieraś & Woliński, 2017) which, to the best of my knowledge, currently allows the most accurate lemmatization of the Polish language. After lemmatization, 52,628 unique tokens remained.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…This dataset lacks non-abbreviation examples. 1) Morfeusz: Morfeusz [16] is a morphological analysis tool for the Polish language. With its help, all the abbreviations with their expanded form were filtered out from the dictionary.…”
Section: B Dictionary-based Additional Datamentioning
confidence: 99%
“…There are many different models capable of lemmatizating texts. I tried four different options -Morfeusz2 [3], spaCy [2], a hybrid of these two and modified Morfeusz2.…”
Section: A Bm25mentioning
confidence: 99%
“…The best solution consisted of two stages. In the first I retrieved 1000 candidate passages using BM25 from Elasticsearch 1 on a corpus and queries lemmatized using Morfeusz2 [3]. In the second stage I calculated different scores and joined them using logistic regression.…”
Section: Introductionmentioning
confidence: 99%