Kashmir Part of Speech Tagger Using CRF

Lawaye, Aadil Ahmad; Purkayastha, Bipul Syam

doi:10.15373/22501991/mar2014/11

Cited by 4 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…for which enormous data is available in digital form, Kashmiri language is data deficient. After exploring various resources (Trilingual (English-Hindi-Kashmiri) E-Dictionary (12) , Kashmiri WordNet (13) , dataset used in (14) and other resources), we managed a raw corpus comprising of about 500K tokens. The overall corpus contains text from different domains like Sports, culture, science etc.…”

Section: Raw Corpusmentioning

confidence: 99%

See 1 more Smart Citation

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Mir¹,

Lawaye²,

Rana³

et al. 2023

IJST

View full text Add to dashboard Cite

Objectives:In this research work maiden attempt is made towards developing a sense annotated corpus for Kashmiri Lexical Sample Word Sense Disambiguation (WSD). Sense annotated dataset is required to use Supervised WSD techniques which are the most effective techniques to carry out WSD. As developing a sense-tagged dataset is an arduous task such datasets are not available for all natural languages. Kashmiri being computationally a lowresource language does not have a sense-tagged corpus available for research purposes. Methods: To develop the sense annotated dataset we selected 60 commonly used ambiguous Kashmiri words and annotated the dataset using the manual annotation method. The usefulness of the dataset is also examined by implementing machine learning algorithms (k-NN, Decision Tree (DT) and Support Vector Machine (SVM)) on it. Part of Speech (PoS) and Bag of Words (BoW) features are used to train the classifiers. Findings: The performance of the machine learning algorithms for Kashmiri WSD is evaluated using accuracy metric. Out of the different classifiers used SVM showed the best performance with an average accuracy of 75.74%. Novelty: This research is the first attempt to develop a sense-tagged dataset for Kashmiri language. The developed dataset would be of great importance to the research community and can be used in various Natural Language Processing tasks like WSD, part-of-speech tagging.

show abstract

Section: Raw Corpusmentioning

confidence: 99%

“…The overall corpus contains text from different domains like Sports, culture, science etc. Using PoS tagger created in research effort (14) thewhole corpus is PoS tagged with an accuracy of 94%.…”

Section: Raw Corpusmentioning

confidence: 99%

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Mir¹,

Lawaye²,

Rana³

et al. 2023

IJST

View full text Add to dashboard Cite

show abstract

“…Kashmiri language mainly spoken by the people of the Kashmiri and is morphologically very rich but no dataset is available for research purpose which poses a great challenge in this study. Dataset used in this study is collected from Kashmiri WordNet, dataset used in [18] , Trilingual Sense Dictionary [19] . In addition, sentences are manually entered using keyboard.…”

Section: Data Collectionmentioning

confidence: 99%

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Mir,

Lawaye

2023

SMSJ

View full text Add to dashboard Cite

Background: A word, phrase, sentence or other communication is “ambiguous” if interpreted in multiple ways. The process of assigning the correct meaning to a word with respect to its context is known as Word Sense Disambiguation (WSD). WSD is intended to be a very imperious problem in Natural Language Processing (NLP) that requires proper attention as it impacts the performance of various NLP applications.Objectives: In this paper first attempt is made to propose a supervised machine learning Kashmiri WSD system.Material & Methods: The dataset comprising of 500K tokens for this research study has been collected from different resources. A sense annotated corpus for fifty commonly used ambiguous Kashmiri words has been created using the manual annotation method. Kashmiri WordNet is used to extract senses for the target words. Decision-tree based classifier is trained using the features extracted from annotated corpus for carrying out WSD task. We have used context widow of ±3 to extract features that are used to train the classifier.Results: The proposed system is tested on all fifty target words and evaluation is carried using accuracy, precision, recall and F-1 measures. The proposed system reported 81.831% accuracy, 0.834 precision,0.816 recall and 0.824 F1-measure.Conclusions: This was the initial step towards developing the WSD system for Kashmir and it has shown good results. In the future we expect to use other algorithms to carry out this task with greater language coverage

show abstract

“…However, by increasing the training data the POS-Tagger may result in better performance as was evident by its result summary. The system performance got raised from 67.22% to 81.10% by varying the training data size from 15000 to 27000 (27,28) .…”

Section: Part-of-speech (Pos) Taggingmentioning

confidence: 99%

Natural Language Processing Resources for the Kashmiri Language

Lone¹,

Giri²,

Bashir³

2022

IJST

View full text Add to dashboard Cite

Objectives:The main objective of this paper as a maiden attempt is to identify the basic resources necessary for undertaking Natural Language Processing (NLP) specific research activities pertaining to Kashmiri language. The paper also deliberates on key issues related to Natural Language Processing of Kashmiri language such as complex linguistic phenomena, the lack of standard linguistic tools, documented as well as standardized resources and the influence of some dominant languages mostly Urdu and English on Kashmiri language. Methods: As there is no substantial work reported in literature specific to NLP of Kashmiri language, a holistic research strategy was adopted to explore the possible sources as potential means for creation of basic resources to undertake the NLP research for Kashmiri language. Findings: After thorough investigation, it was observed that there has been some trivial work reported in the literature related to Machine Translation of Kashmiri language. Further there are few newspapers published in Kashmiri language which can be used as a means for creation of Kashmiri corpus. Moreover crowdsourcing could be used a potential means for development of digital linguistic resources for Kashmiri language. Novelty: The present study is a maiden attempt towards identification of NLP resources for Kashmiri language and will be of immense importance for the research community interested to work for development of Kashmiri language in digital domain.

show abstract

Kashmir Part of Speech Tagger Using CRF

Cited by 4 publications

References 6 publications

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Natural Language Processing Resources for the Kashmiri Language

Contact Info

Product

Resources

About