2017
DOI: 10.3233/sw-170253
|View full text |Cite
|
Sign up to set email alerts
|

Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1

Abstract: Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for English. However the studies for Turkish, which is a morphologically richer and lesser-studied language, have fallen behind these for a long while. In recent years, Turkish NER intrigued researchers due to its scarce data resources and the unavailability of high-performing systems. Especially, the need to disco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(21 citation statements)
references
References 32 publications
0
18
0
2
Order By: Relevance
“…Consequently, they obtain an F1 score of 48.96% on the noisy dataset. Seker and Eryigit (2017) present the state-of-the-art model on Turkish NER which is another CRF-based model, similar to that of Çelikkaya et al (2013). The authors use an extensive set of morphological and lexical features (e.g., stem, part-of-speech (POS) tags, capitalization, word type and shape flags) and gazetteers.…”
Section: Ner On Turkish Noisy Datamentioning
confidence: 92%
“…Consequently, they obtain an F1 score of 48.96% on the noisy dataset. Seker and Eryigit (2017) present the state-of-the-art model on Turkish NER which is another CRF-based model, similar to that of Çelikkaya et al (2013). The authors use an extensive set of morphological and lexical features (e.g., stem, part-of-speech (POS) tags, capitalization, word type and shape flags) and gazetteers.…”
Section: Ner On Turkish Noisy Datamentioning
confidence: 92%
“…At present, CRF has been proved to be a better result in the biomedical field, and the machine learning method represented by CRF is called the mainstream. Şeker and Eryiğit presented a CRF‐based NER system, which successfully models the morphologically very rich nature of this language; its extensions to expand the covered named entity types and also to process extra challenging user generated content coming with Web 2.0. The highest accuracy was achieved, ie, 0.82.…”
Section: Related Workmentioning
confidence: 99%
“…Our first rendition of the MWE annotations also included named entities, but it was decidedly rather primitive. In accordance with more recent studies that aimed to establish annotation standards for MWEs and named entities [11,17,18], we supplied the current releases of IMST and IWT with more comprehensive and systematic MWE annotations, all manually annotated.…”
Section: New Language Resourcesmentioning
confidence: 99%
“…The ITU-METU-Sabancı Treebank (IMST) [9] was later developed as the output of our research project [10], as a reannotated version of the METU-Sabancı Treebank, following a revised annotation framework. IMST proved to be a robust resource [9], despite being a relatively young treebank [11]. Another new resource is the ITU Web Treebank (IWT) [12], which is the first Turkish web treebank and one of the first fully annotated treebanks of user-generated content worldwide, following its international predecessors, the Google English Web Treebank [13] and the French Social Media Bank [14].…”
Section: Introductionmentioning
confidence: 99%