Michał Woźniak scite author profile

Michał Woźniak

5Publications

18Citation Statements Received

28Citation Statements Given

How they've been cited

How they cite others

Affiliations

The Institute of the Polish Language of the Polish Academy of Sciences

Publications

Order By: Most citations

Linguistic measures of chemical diversity and the “keywords” of molecular collections

Woźniak

Wołos

Modrzyk

et al. 2018

Sci Rep

View full text Add to dashboard Cite

Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic “chemical words” that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular “keywords” by which such collections are best characterized and annotated.

show abstract

Korpus języka mówionego mieszkańców Spisza

et al. 2019

View full text Add to dashboard Cite

A Spoken Corpus of Inhabitants of Polish SpiszThe article describes a dialect corpus project that documents the dialect of Polish Spisz. In contrast to the majority of dialectological research in Poland, our corpus also includes the speech of the youngest and middle generations, as its aim is also to document the sociolinguistic situation of the dialect of the region. Recordings have been transcribed into standard Polish orthography, not phonetically, which makes it possible not only to easily search the corpus but also to use existing tools to lemmatize and add morphosyntactic annotation to the texts. Users interested in the phonetic layer can access the recordings on a per-utterance basis. The article describes the stages of compiling the corpus and discusses its potential applications. The authors argue that a large corpus which covers a small, homogeneous area is a more valuable resource for dialectologists than a series of small corpora documenting a larger region.

show abstract

Korpus Polsko-Niemiecki Uniwersytetu Warszawskiego i Uniwersytetu Gutenberga (PolGerCorp)

Łaziński

Meger

Woźniak

2022

View full text Add to dashboard Cite

Celem artykułu jest prezentacja nowego korpusu polsko-niemieckiego PolGerCorp, zgromadzonego w latach 2018-2021 w ramach projektu Uniwersytetu Warszawskiego i Uniwersytetu Gutenberga w Moguncji. Korpus ma wielkość 10 mln słów w tekstach z lat 1750–2020. Teksty oryginalne polskie oraz niemieckie wraz z tłumaczeniami reprezentują prozę artystyczną, non-fiction, prasę i teksty prawne. Teksty zostały otagowane, zlematyzowane i wyrównane zdanie do zdania. W artykule prezentujemy interfejs przyjazny dla użytkownika niespecjalisty oraz możliwości wyszukiwania. Tagowanie odpowiedników aspektowych daje szczególną i unikalną szansę wyszukiwania w ramach par aspektowych z uwzględnieniem różnych formalnych typów wyznaczników aspektu.

show abstract

A szepességi lengyel nyelvjárás korpusznyelvészeti elemzése

Grochola-Szczepanek

Waldenfels

Górski

et al. 2021

dh-hun

View full text Add to dashboard Cite

A tanulmány a lengyel Szepesség nyelvjárásának korpuszát létrehozó projektet ismerteti. A lengyelországi dialektológiai kutatások többségétől eltérően korpuszunk a fiatal és a középgeneráció beszédét is tartalmazza, mivel célja a régió nyelvjárásának, szociolingvisztikai helyzetének dokumentálása is. A felvételeket nem fonetikusan, hanem a sztenderd lengyel ortográfiával írtuk át, ami nemcsak a korpuszban való egyszerű keresést teszi lehetővé, hanem azt is, hogy a meglévő eszközökkel lemmatizáljuk és morfoszintaktikai annotációval egészítsük ki a szövegeket. A fonetika iránt érdeklődő felhasználók a felvételeket mondatonként érhetik el. A cikk ismerteti a korpusz összeállításának lépéseit, és tárgyalja a lehetséges alkalmazásokat. A szerzők amellett kívánnak érvelni, hogy egy nagy korpusz, amely egy kis, homogén területet fed le, sokkal értékesebb forrás a dialektológusok számára, mint egy sor kisebb korpusz, amely egy nagyobb régiót dokumentál.

show abstract

Conjunct Lengths in English, Dependency Length Minimization, and Dependency Structure of Coordination

Przepiórkowski¹,

Woźniak²

2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.