Unicode-8 based linguistics data set of annotated Sindhi text

Dootio, Mazhar Ali; Wagan, Asim Imdad

doi:10.1016/j.dib.2018.05.062

Cited by 8 publications

(3 citation statements)

References 2 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…States that it takes a lot of resources to create sentiment lexicons for multiple languages or mixed codes. Emotion detection [ [2] , [3] ] plays a vital role in accomplishing several tasks [4] , such as behavior recognition, etc. The raw text written in any language must be annotated before using natural language processing algorithms to extract linguistic aspects [5] .…”

Section: Data Descriptionmentioning

confidence: 99%

Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Samreen,

Ali

2024

Data in Brief

View full text Add to dashboard Cite

Section: Data Descriptionmentioning

confidence: 99%

Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Samreen,

Ali

2024

Data in Brief

View full text Add to dashboard Cite

“…The data set contains information on the grammatical and morphological structure of Sindhi language texts, as well as the sentiment polarity of Sindhi lexicons. As a result, data sets can be used for information retrieval, machine translation, lexicon analysis, language modelling analysis, grammatical and morphological analysis, and sentiment and semantic analysis [20].…”

Section: Literature Reviewmentioning

confidence: 99%

Speech to Text by Using the Sindhi Language

Bux¹,

Khan²,

Bakhsh³

2022

IJAIMS

View full text Add to dashboard Cite

We live in the era of technology, and advancement in technology is growing exponentially. In Pakistan, especially in Sindh, many people prefer to speak than write. People are not well aware of computational and other global languages. So, that's why they face so many difficulties typing and then converting it into the Sindhi language. Especially in offices/organizations where Sindhi is the first language used in speaking and typing. There drafting huge consumes too much time. They face many difficulties such as finding spelling correct words and so on. People with medical deficiencies and disabilities will also get a beneficial source of help from this tool. This tool can handle all these difficulties and solve all the discussed problems. This project aims to develop a web-based application that tries to overcome the disadvantages of the other available applications. The application is generic, meaning it may work solely for a specific regional language speaker in any country in the world. The main objective of the work presented throughout this report is to develop an enterprise and open platform for the nation. We are using the Convolutional Neural Network and API in the development phase. Advance python libraries detect the user's speech, and then the conversion will take one into text.

show abstract

“…Multiple placements of dots were observed and these were sometimes below, above, inside and in between the characters. Dootio and Wagan [12,13] worked on the NLP and reported in their research that Sindhi script had many classes and characteristics of Sindhi corpus. A lot of work is in English script and NLP tools are offered in English scripts which perform all tasks of English script, but in the Sindhi language, no powerful application is available for the feature extraction and corpus.…”

Section: Related Workmentioning

confidence: 99%

Romanized Sindhi Rules for Text Communication

Sodhar

Jalbani

Channa

et al. 2021

Mehran Univ. res. j. eng. technol.

View full text Add to dashboard Cite

Sindhi is one of the historical languages which is widely used in all over the world, but especially in the province of Sindh Pakistan. Sindhi language has its own script and written by the right-handed. Nowadays the use of different Sindhi platforms is increasing especially for communication. The majority of the people of Sindh province read, write and speak very well, but they face the problem in text communication while using different communication platforms. However, the users of computer and mobile phone feel trouble/difficulty during the use of the Sindhi script in typing of text messages, tweets and comments while using different platforms in computer and mobile phone. Natural Language Processing (NLP) is one of the better options for the solution of these problems of text communication on different platforms. For the proper solution of text communication issues, Romanized Sindhi text is used instead of Sindhi text. Romanized text writing is easier than the Sindhi text writing because Sindhi text writing needs the special type of keyboard while writing of Romanized text does not need any special type of keyboard. For the writing of Romanized Sindhi text, rules are defined in this paper which provide easiness during writing and understanding of the text. Romanized Sindhi Rules (RSR) are simple and easy to understand the meaning of the text and provide fast communication (text). This study is also helpful for further research in the Romanized Sindhi text by using different approaches and provides easiness in communication.

show abstract

Unicode-8 based linguistics data set of annotated Sindhi text

Cited by 8 publications

References 2 publications

Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Speech to Text by Using the Sindhi Language

Romanized Sindhi Rules for Text Communication

Contact Info

Product

Resources

About