Morfeusz 2 – analizator i generator fleksyjny dla języka polskiego

Kieraś, Witold; Woliński, Marcin

doi:10.31286/jp.97.1.7

Cited by 5 publications

(4 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Next, texts were tokenized, which resulted in obtaining 141,555 unique tokens. All were reduced to their original form with a lemmatization operation performed with the Morfeusz 2 program (Kieraś & Woliński, 2017) which, to the best of my knowledge, currently allows the most accurate lemmatization of the Polish language. After lemmatization, 52,628 unique tokens remained.…”

Section: Data Preprocessingmentioning

confidence: 99%

Predicting the Amount of Compensation for Harm Awarded by Courts Using Machine-Learning Algorithms

Świtała

2024

Central European Economic Journal

View full text Add to dashboard Cite

The present study aims to explain and predict the monetary amount awarded by courts as compensation for harm suffered. A set of machine-learning algorithms was applied to a sample of decisions handed down by the Polish common courts. The methodology involved two steps: identification of words and phrases whose counts or frequencies affect the amounts adjudicated with LASSO regression and expert assessment, then applying OLS, again LASSO, random forests and XGBoost algorithms, as well as a BERT approach to make predictions. Finally, an in-depth analysis was undertaken on the influence of individual words and phrases on the amount awarded. The results demonstrate that the size of awards is most strongly influenced by the type of injury suffered, the specifics of treatment, and the family relationship between the harmed party and the claimant. At the same time, higher values are awarded when compensation for material damage and compensation for harm suffered are claimed together or when the claim is extended after it was filed.

show abstract

Section: Data Preprocessingmentioning

confidence: 99%

Predicting the Amount of Compensation for Harm Awarded by Courts Using Machine-Learning Algorithms

Świtała

2024

Central European Economic Journal

View full text Add to dashboard Cite

show abstract

“…This dataset lacks non-abbreviation examples. 1) Morfeusz: Morfeusz [16] is a morphological analysis tool for the Polish language. With its help, all the abbreviations with their expanded form were filtered out from the dictionary.…”

Section: B Dictionary-based Additional Datamentioning

confidence: 99%

Abbreviation Disambiguation in Polish Press News Using Encoder-Decoder Models

Wróbel,

Karbowski,

Lewkowicz

2023

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

The disambiguation of abbreviations and acronyms is a longstanding problem in Natural Language Processing (NLP) that has garnered significant attention from researchers. Previous approaches have employed statistical methods, semantic similarity metrics, and machine learning algorithms. Various languages and document types have been explored, with English being the most commonly studied language. Recent advances have been driven by the application of pre-trained transformer models. Standardization and addressing the challenges of multilingual and multi-document type disambiguation remain ongoing goals in the field of NLP. This paper presents an in-depth exploration of abbreviation disambiguation using state-of-the-art neural Encoder-Decoder models, specifically the ByT5 and plT5 architectures. Advanced synthetic data generation techniques are introduced and their effect on model performance is analysed. The methods are evaluated in the context of the PolEval abbreviation disambiguation competition, where the authors achieve top ranking.

show abstract

“…There are many different models capable of lemmatizating texts. I tried four different options -Morfeusz2 [3], spaCy [2], a hybrid of these two and modified Morfeusz2.…”

Section: A Bm25mentioning

confidence: 99%

“…The best solution consisted of two stages. In the first I retrieved 1000 candidate passages using BM25 from Elasticsearch 1 on a corpus and queries lemmatized using Morfeusz2 [3]. In the second stage I calculated different scores and joined them using logistic regression.…”

Section: Introductionmentioning

confidence: 99%

Passage Retrieval in question answering systems in Polish language

Pacanowska

2023

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

This paper describes the submissions to Task 3 of PolEval 2022. Passage Retrieval is a problem of retrieving a passage relevant to the given query. It is an important problem with many practical use cases, especially in question answering. It is very beneficial if a model is generalizable, that is effective in various domains, even the ones it was not trained on. This is a challenge for many state-of-the-art models. In this paper I describe and test many different methods of approaching this problem -from standard techniques, such as BM25 and lemmatization to recently developed methods based on deep learning and transformers.

show abstract

Morfeusz 2 – analizator i generator fleksyjny dla języka polskiego

Abstract: Morfeusz 2 -analizator i generator fleksyjny dla języka polskiego1Słowa kluczowe: analiza i synteza morfologiczna, fleksja, przetwarzanie języka naturalnego.

Cited by 5 publications

References 3 publications

Predicting the Amount of Compensation for Harm Awarded by Courts Using Machine-Learning Algorithms

Predicting the Amount of Compensation for Harm Awarded by Courts Using Machine-Learning Algorithms

Abbreviation Disambiguation in Polish Press News Using Encoder-Decoder Models

Passage Retrieval in question answering systems in Polish language

Contact Info

Product

Resources

About