Abstract:Contextual text feature extraction and classification play a vital role in the multi-document summarization process. Natural language processing (NLP) is one of the essential text mining tools which is used to preprocess and analyze the large document sets. Most of the conventional single document feature extraction measures are independent of contextual relationships among the different contextual feature sets for the document categorization process. Also, these conventional word embedding models such as TF-I… Show more
“…This model combines the use of online and manual approach. This type of model is commonly used to create rules for language analysis and is a popular NLP technique to perform different tasks on different languages as it is easier to understand while the results are based on ground truth values [19][20]. Fig.…”
Section: Rule-based Model For Labelling Of Romanized Sindhi Textmentioning
Sindhi is one of the most ancient languages in the world and it has its own written and spoken scripts. After the rigorous study it was found that a lot of research work has been done in different languages, but word by word labelling of Sindhi language had not been done yet. In this research study, word labelling was done on 100 sentences of Romanized Sindhi texts using Python online tool. The dataset was collected from different sources which include Sindhi newspaper, blogs and social media webpages. From this dataset, a rule-based model has been applied for the Parts-of-Speech (POS) tagging of the Romanized Sindhi sentences. A total of 624 words of Romanized Sindhi texts were tested and successfully tagged by the SindhiNLP tool in which 482 words were tagged as nouns and pronouns, 92 words tagged as verbs and 50 words tagged as determinants.
“…This model combines the use of online and manual approach. This type of model is commonly used to create rules for language analysis and is a popular NLP technique to perform different tasks on different languages as it is easier to understand while the results are based on ground truth values [19][20]. Fig.…”
Section: Rule-based Model For Labelling Of Romanized Sindhi Textmentioning
Sindhi is one of the most ancient languages in the world and it has its own written and spoken scripts. After the rigorous study it was found that a lot of research work has been done in different languages, but word by word labelling of Sindhi language had not been done yet. In this research study, word labelling was done on 100 sentences of Romanized Sindhi texts using Python online tool. The dataset was collected from different sources which include Sindhi newspaper, blogs and social media webpages. From this dataset, a rule-based model has been applied for the Parts-of-Speech (POS) tagging of the Romanized Sindhi sentences. A total of 624 words of Romanized Sindhi texts were tested and successfully tagged by the SindhiNLP tool in which 482 words were tagged as nouns and pronouns, 92 words tagged as verbs and 50 words tagged as determinants.
“…Sentiment analysis is the analysis of opinions about users [3,4]. The principle part of artificial intelligence (AI) and man-made brainpower in NLP is to measure the content and investigate the importance of the content [5]. The information or text utilized for the Natural Language Processing looks like unstructured and organized information or text [6].…”
Sentiment analysis is an important part of natural language processing (NLP). This study evaluated the sentiment of Romanized Sindhi Text (RST) using a hybrid approach and ground truth values. The methodology of sentiment analysis involves three major steps: input data, process on tool, analysis of data and evaluation of results. One hundred RST sentences were used in this study's sentiment analysis, which can be positive, neutral, or negative. The statements in the corpus of this study are simple to understand and are used in everyday life. This research used an online Python tool to process a text and get results in the form of outcomes. The results showed that 86% of the sentences have neutral sentiments, 9% of the total results of sentiment analysis have negative sentiments, and only 5% of sentences of Romanized Sindhi Text have positive sentiments. The accuracy of the RST was measured on an online calculator and the value was 87.02% on the basis of ground truth values. An error ratio of 12.98% was calculated on the basis accuracy found on the online calculator of confusion matrix.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.