Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS) 2020
DOI: 10.18653/v1/2020.nlposs-1.11
|View full text |Cite
|
Sign up to set email alerts
|

KLPT – Kurdish Language Processing Toolkit

Abstract: Despite the recent advances in applying language-independent approaches to various natural language processing tasks thanks to artificial intelligence, some language-specific tools are still essential to process a language in a viable manner. Kurdish language is a lessresourced language with a notable diversity in dialects and scripts and lacks basic language processing tools. To address this issue, we introduce a language processing toolkit to handle such a diversity in an efficient way. Our toolkit is compos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(22 citation statements)
references
References 46 publications
(16 reference statements)
0
22
0
Order By: Relevance
“…As the Kurdish language is one of the less-resourced Indo-European languages and written using different scripts, it still lacks digital text resources to enable NLP applications [7]. The Kurdish scripts lack standardized orthographies and create differences in writing words, particularly compound forms [8].…”
Section: Natural Language Processingmentioning
confidence: 99%
See 2 more Smart Citations
“…As the Kurdish language is one of the less-resourced Indo-European languages and written using different scripts, it still lacks digital text resources to enable NLP applications [7]. The Kurdish scripts lack standardized orthographies and create differences in writing words, particularly compound forms [8].…”
Section: Natural Language Processingmentioning
confidence: 99%
“…The tokenization is the fundamental step in the NLP pipeline, which is required for the advanced steps such as part-of-speech tagging, syntactic analysis, and machine translation [9]. The Kurdish language has prefixes and suffixes attaching to a lemma, therefore it has a complex structure in terms of morphology, especially in the Sorani dialect.…”
Section: Kurdish Language Characteristicsmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, pre-processing would be challenging since the language has progressed in NLP. Luckily, we could use the python KLPT toolkit developed by Ahmadi (2020). The libraries on KLPT helped us with normalization, standardization, and tokenization.…”
Section: Introductionmentioning
confidence: 99%
“…For the Central Kurdish language, a number of works have been done on NLP. There have been a number of studies on spell-checking and stemming [12] [13] [14] [15] [16], and [17] produce Kurdish Language Processing Toolkit (KLPT) their toolkit is composed of basic elements like text pre-processing, tokenization, stemming, transliteration and lemmatization. There is an n-gram-based document classifier [18], also with respect to their efforts, there are some small efforts to create a lexicon and corpus for Kurdish [19], [20].…”
Section: Introductionmentioning
confidence: 99%