Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark's Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification.
a b s t r a c tThe current work is focusing on the implementation of a robust multibit watermarking algorithm for digital images, which is based on an innovative spread spectrum technique analysis. The paper presents the watermark embedding and detection algorithms, which use both wavelets and the Discrete Cosine Transform and analyzes the arising issues.
Abstract. Web 2.0 has facilitated interactive information sharing on the WWW, allowing users the opportunity to articulate their opinions on different topics. In this framework, certain practices implement information monitoring systems so as digests, reports on keywords and thematic queries regarding opinions on government decisions to be created. Analysis of rubrics associations, primary semantic and statistical interpretation of the texts is usually carried out. It is, on the other hand, rather difficult to get punctual predicts and estimate sufficiently forum users' opinion strength. In this work we present a methodology which automatically mines and estimates the strength of users' opinions on text forums regarding government decisions. According to our methodology, quantitative features are automatically mined from forum posts and then passed to a Support Vector Machine based classifier where the users' opinion strength is estimated. The proposed methodology has been validated in real data and initial experimental results are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.