2013
DOI: 10.1007/978-3-642-40585-3_20
|View full text |Cite
|
Sign up to set email alerts
|

CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
25
0
7

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(34 citation statements)
references
References 9 publications
0
25
0
7
Order By: Relevance
“…For comparison with state of the art, Czech PDT UD 2.2 treebank without gold segmentation and tokenization is used in evaluation, according to the CoNLL 2018 shared task training and evaluation protocol. Our system reuses segmentation and tokenization produced by UDPipe 2.0 in the CoNLL [20] -79.00 -- Straková et al (2013) [33] 79.23 82.82 -- Straková et al (2016) [36] 81.20 84.68 79.23 82.78 Table 6. Named entity recognition results (F1) on the Czech Named Entity Corpus.…”
Section: Pos Tagging Lemmatization and Dependency Parsing On Universmentioning
confidence: 99%
“…For comparison with state of the art, Czech PDT UD 2.2 treebank without gold segmentation and tokenization is used in evaluation, according to the CoNLL 2018 shared task training and evaluation protocol. Our system reuses segmentation and tokenization produced by UDPipe 2.0 in the CoNLL [20] -79.00 -- Straková et al (2013) [33] 79.23 82.82 -- Straková et al (2016) [36] 81.20 84.68 79.23 82.78 Table 6. Named entity recognition results (F1) on the Czech Named Entity Corpus.…”
Section: Pos Tagging Lemmatization and Dependency Parsing On Universmentioning
confidence: 99%
“…carry sentiment and how their presence influences classification accuracy. For these experiments, we employ a CRF-based named entity recognizer (Konkol & Konopík, 2013) and replace the words identified as entities with their respective entity type (e.g., McDonald's becomes company). This preprocessing has not been widely discussed in the literature devoted to document-level sentiment analysis, but Boiy and Moens (2009), for example, remove the 'entity of interest' in their approach.…”
Section: Preprocessingmentioning
confidence: 99%
“…Prior work targeting NEs specifically for Slavic languages includes tools for NE recognition for Croatian (Karan et al, 2013;Ljubešić et al, 2013), a tool tailored for NE recognition in Croatian tweets (Baksa et al, 2017), a manually annotated NE corpus for Croatian (Agić and Ljubešić, 2014), tools for NE recognition in Slovene (Štajner et al, 2013;Ljubešić et al, 2013), a Czech corpus of 11,000 manually annotated NEs (Ševčíková et al, 2007), NER tools for Czech (Konkol and Konopík, 2013), tools and resources for fine-grained annotation of NEs in the National Corpus of Polish (Waszczuk et al, 2010;Savary and Piskorski, 2011) and a recent shared task on NE Recognition in Russian .…”
Section: Introductionmentioning
confidence: 99%
“…In 2010, the NEWS Workshop included a shared task on Transliteration Mining (Kumaran et al, 2010), i.e., mining of names from parallel corpora. This task included corpora in English, Chinese, Tamil, Russian, and Arabic.Prior work targeting NEs specifically for Slavic languages includes tools for NE recognition for Croatian (Karan et al, 2013;Ljubešić et al, 2013), a tool tailored for NE recognition in Croatian tweets (Baksa et al, 2017), a manually annotated NE corpus for Croatian (Agić and Ljubešić, 2014), tools for NE recognition in Slovene (Štajner et al, 2013;Ljubešić et al, 2013), a Czech corpus of 11,000 manually annotated NEs (Ševčíková et al, 2007), NER tools for Czech (Konkol and Konopík, 2013), tools and resources for fine-grained annotation of NEs in the National Corpus of Polish (Waszczuk et al, 2010;Savary and Piskorski, 2011) and a recent shared task on NE Recognition in Russian .To the best of our knowledge, the shared task described in this paper is the first attempt at multilingual name recognition, normalization, and cross-lingual entity matching that covers a large number of Slavic languages. This paper is organized as follows.…”
mentioning
confidence: 99%