AbstractThis paper addresses the problem of Twitter sentiment analysis through a hybrid approach in which SentiWordNet (SWN)-based feature vector acts as input to the classification model Support Vector Machine. Our main focus is to handle lexical modifier negation during SWN score calculation for the improvement of classification performance. Thus, we present naive and novel shift approach in which negation acts as both sentiment-bearing word and modifier, and then we shift the score of words from SWN based on their contextual semantic, inferred from neighbouring words. Additionally, we augment negation accounting procedure with a few heuristics for handling the cases in which negation presence does not necessarily mean negation. Experimental results show that the contextual-based SWN feature vector obtained through shift polarity approach alone led to an improved Twitter sentiment analysis system that outperforms the traditional reverse polarity approach by 2–6%. We validate the effectiveness of our hybrid approach considering negation on benchmark Twitter corpus from SemEval-2013 Task 2 competition.
An explosive growth of spatial data has been demanding to Spatial Data Mining (SDM) technology, emerging as a innovative area for spatial data analysis. Geographical Information System (GIS) contains heterogeneous data from multidisciplinary sources in different formats. Geodatabase is the repository of GIS data, representing spatial attributes, with respect to location. Rapidly increasing satellite imagery and geodatabases generates huge data volume related to real world and natural resources such as soil, water, temperature, vegetation, forest cover etc. Inferring information from geodatabases has gained value using computational algorithms. The intent of this paper is to introduce with GIS, and spatial data mining, GIS and SDM tools, algorithmic approaches, issues and challenges, and role of spatial association rule mining in big data of GIS.
Part-of-speech (POS) tagging is a process of assigning the words in a text corresponding to a particular part of speech. A fundamental version of POS tagging is the identification of words as nouns, verbs, adjectives etc. For processing natural languages, Part of Speech tagging is a prominent tool. It is one of the simplest as well as most constant and statistical model for many NLP applications. POS Tagging is an initial stage of linguistics, text analysis like information retrieval, machine translator, text to speech synthesis, information extraction etc. In POS Tagging we assign a Part of Speech tag to each word in a sentence and literature. Various approaches have been proposed to implement POS taggers. In this paper we present a Marathi part of speech tagger. It is morphologically rich language. Marathi is spoken by the native people of Maharashtra. The general approach used for development of tagger is statistical using Unigram, Bigram, Trigram and HMM Methods. It presents a clear idea about all the algorithms with suitable examples. It also introduces a tag set for Marathi which can be used for tagging Marathi text. In this paper we have shown the development of the tagger as well as compared to check the accuracy of taggers output. The three Marathi POS taggers viz. Unigram, Bigram, Trigram and HMM gives the accuracy of 77.38%, 90.30%, 91.46% and 93.82% respectively.
Urdu is a combination of several languages like Arabic, Hindi, English, Turkish, Sanskrit etc. It has a complex and rich morphology. This is the reason why not much work has been done in Urdu language processing. Stemming is used to convert a word into its respective root form. In stemming, we separate the suffix and prefix from the word. It is useful in search engines, natural language processing and word processing, spell checkers, word parsing, word frequency and count studies. This paper presents a rule based stemmer for Urdu. The stemmer that we have discussed here is used in information retrieval. We have also evaluated our results by verifying it with a human expert.
Since long, corporations are looking for knowledge sources which can provide structured description of data and can focus on meaning and shared understanding. Structures which can facilitate open world assumptions and can be flexible enough to incorporate and recognize more than one name for an entity. A source whose major purpose is to facilitate human communication and interoperability. Clearly, databases fail to provide these features and ontologies have emerged as an alternative choice, but corporations working on same domain tend to make different ontologies. The problem occurs when they want to share their data/knowledge. Thus we need tools to merge ontologies into one. This task is termed as ontology matching. This is an emerging area and still we have to go a long way in having an ideal matcher which can produce good results. In this paper we have shown a framework to matching ontologies using graphs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.