2018
DOI: 10.1016/j.dib.2018.05.062
|View full text |Cite
|
Sign up to set email alerts
|

Unicode-8 based linguistics data set of annotated Sindhi text

Abstract: Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 2 publications
(2 reference statements)
0
3
0
Order By: Relevance
“…States that it takes a lot of resources to create sentiment lexicons for multiple languages or mixed codes. Emotion detection [ [2] , [3] ] plays a vital role in accomplishing several tasks [4] , such as behavior recognition, etc. The raw text written in any language must be annotated before using natural language processing algorithms to extract linguistic aspects [5] .…”
Section: Data Descriptionmentioning
confidence: 99%
“…States that it takes a lot of resources to create sentiment lexicons for multiple languages or mixed codes. Emotion detection [ [2] , [3] ] plays a vital role in accomplishing several tasks [4] , such as behavior recognition, etc. The raw text written in any language must be annotated before using natural language processing algorithms to extract linguistic aspects [5] .…”
Section: Data Descriptionmentioning
confidence: 99%
“…The data set contains information on the grammatical and morphological structure of Sindhi language texts, as well as the sentiment polarity of Sindhi lexicons. As a result, data sets can be used for information retrieval, machine translation, lexicon analysis, language modelling analysis, grammatical and morphological analysis, and sentiment and semantic analysis [20].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Multiple placements of dots were observed and these were sometimes below, above, inside and in between the characters. Dootio and Wagan [12,13] worked on the NLP and reported in their research that Sindhi script had many classes and characteristics of Sindhi corpus. A lot of work is in English script and NLP tools are offered in English scripts which perform all tasks of English script, but in the Sindhi language, no powerful application is available for the feature extraction and corpus.…”
Section: Related Workmentioning
confidence: 99%