Piotr Rychlik scite author profile

Piotr Rychlik

3Publications

28Citation Statements Received

24Citation Statements Given

How they've been cited

How they cite others

Affiliations

Polish Academy of Sciences, Institute of Computer Science, University of Warsaw

Publications

Order By: Most citations

Testing word embeddings for Polish

Mykowiecka

Marciniak

Rychlik

2017

View full text Add to dashboard Cite

Distributional Semantics postulates the representation of word meaning in the form of numeric vectors which represent words which occur in context in large text data. This paper addresses the problem of constructing such models for the Polish language. The paper compares the effectiveness of models based on lemmas and forms created with Continuous Bag of Words (CBOW) and skipgram approaches based on different Polish corpora. For the purposes of this comparison, the results of two typical tasks solved with the help of distributional semantics, i.e. synonymy and analogy recognition, are compared. The results show that it is not possible to identify one universal approach to vector creation applicable to various tasks. The most important feature is the quality and size of the data, but different strategy choices can also lead to significantly different results.

show abstract

Recognition of irrelevant phrases in automatically extracted lists of domain terms

Mykowiecka

Marciniak

Rychlik

2018

TERM

View full text Add to dashboard Cite

In our paper, we address the problem of recognition of irrelevant phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms or discourse expressions. We defined several methods based on comparison of domain corpora and a method based on contexts of phrases identified in a large corpus of general language. The methods were tested on Polish data. We used six domain corpora and one general corpus. Two test sets were prepared to evaluate the methods. The first one consisted of many presumably irrelevant phrases, as we selected phrases which occurred in at least three domain corpora. The second set mainly consisted of domain terms, as it was composed of the top-ranked phrases automatically extracted from the analyzed domain corpora. The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method, with a precision of about 0.75 on half of the tested list, was the context based method using a modified contextual diversity coefficient. Although the methods were tested on Polish, they seems to be language independent.

show abstract

Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish

Marciniak

Mykowiecka

Rychlik

2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Piotr Rychlik

Testing word embeddings for Polish

Recognition of irrelevant phrases in automatically extracted lists of domain terms

Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish

Contact Info

Product

Resources

About