2013
DOI: 10.4312/slo2.0.2013.2.58-81
|View full text |Cite
|
Sign up to set email alerts
|

Razpoznavanje imenskih entitet v slovenskem besedilu

Abstract: Članek predstavlja algoritem in implementacijo programa za razpoznavanje imen v slovenskem jeziku s pomočjo strojnega učenja. Nadzorovani pristop na osnovi pogojnih naključnih polj je naučen na označenem korpusu ssj500k. V korpusu, ki je prosto dostopen pod licenco Creative Commons CC-BY-NC-SA, so pri besednih pojavnicah poleg oblikoskladenjskih oznak in lem označena tudi imena organizacij, osebna, zemljepisna ter stvarna imena. Članek predstavlja vpliv na natančnost razpoznavanja ob uporabi oblikoskladenjskih… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 3 publications
0
4
0
Order By: Relevance
“…In 2010, the NEWS Workshop included a shared task on Transliteration Mining (Kumaran et al, 2010), i.e., mining of names from parallel corpora, in English, Chinese, Tamil, Russian, and Arabic. Research on NER focusing on Slavic languages includes NER for Croatian (Karan et al, 2013;Ljubešić et al, 2013); NER in Croatian tweets (Baksa et al, 2017); a manually annotated NE corpus for Croatian (Agić and Ljubešić, 2014); NER in Slovene (Štajner et al, 2013;Ljubešić et al, 2013); a Czech corpus of 11K annotated NEs (Ševčíková et al, 2007); NER for Czech (Konkol and Konopík, 2013); tools and resources for fine-grained annotation of NEs in the National Corpus of Polish (Waszczuk et al, 2010;Savary and Piskorski, 2011); lemmatization of NEs for Polish (Piskorski et al, 2009;Marcińczuk, 2017). Shared tasks on NER for Polish were organized under the umbrella of POLEVAL 2 Kobyliński, 2018, 2020) and LESZCZE 3 campaigns.…”
Section: Related Workmentioning
confidence: 99%
“…In 2010, the NEWS Workshop included a shared task on Transliteration Mining (Kumaran et al, 2010), i.e., mining of names from parallel corpora, in English, Chinese, Tamil, Russian, and Arabic. Research on NER focusing on Slavic languages includes NER for Croatian (Karan et al, 2013;Ljubešić et al, 2013); NER in Croatian tweets (Baksa et al, 2017); a manually annotated NE corpus for Croatian (Agić and Ljubešić, 2014); NER in Slovene (Štajner et al, 2013;Ljubešić et al, 2013); a Czech corpus of 11K annotated NEs (Ševčíková et al, 2007); NER for Czech (Konkol and Konopík, 2013); tools and resources for fine-grained annotation of NEs in the National Corpus of Polish (Waszczuk et al, 2010;Savary and Piskorski, 2011); lemmatization of NEs for Polish (Piskorski et al, 2009;Marcińczuk, 2017). Shared tasks on NER for Polish were organized under the umbrella of POLEVAL 2 Kobyliński, 2018, 2020) and LESZCZE 3 campaigns.…”
Section: Related Workmentioning
confidence: 99%
“…Research on NE focusing on Slavic languages includes NE recognition for Croatian (Karan et al, 2013;Ljubešić et al, 2013), NE recognition in Croatian tweets (Baksa et al, 2017), a manually annotated NE corpus for Croatian (Agić and Ljubešić, 2014), NE recognition in Slovene (Štajner et al, 2013;Ljubešić et al, 2013), a Czech corpus of 11K annotated NEs (Ševčíková et al, 2007), NER for Czech (Konkol and Konopík, 2013), tools and resources for fine-grained annotation of NEs in the National Corpus of Polish (Waszczuk et al, 2010;Savary and Piskorski, 2011), NER shared tasks for Polish organized under the umbrella of POLEVAL 2 (Ogrodniczuk andŁukasz Kobyliński, 2018, 2020) and LESZCZE 3 campaigns, recent shared tasks on NE Recognition in Russian (Starostin et al, 2016;Artemova et al, 2022), the latter utilizing the NEREL dataset (a Russian dataset for named entity recognition and relation extraction, described in Loukachevitch et al, 2021)…”
Section: Prior Workmentioning
confidence: 99%
“…Approximately a third of ssj500k 2.2 (9,478 sentences) was manually annotated with named entity annotations in the WebAnno tool, with the aim of developing a named entity extractor for the Slovene language based on machine learning [20]. The annotation distinguished five types of NE: Person (per), Person Derivative (deriv-per), Location (loc), Organization (org), and Miscellaneous (misc).…”
Section: Named Entitiesmentioning
confidence: 99%