A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis

Symeonidis, Symeon; Effrosynidis, Dimitrios; Arampatzis, Avi

doi:10.1016/j.eswa.2018.06.022

Cited by 180 publications

(134 citation statements)

References 24 publications

Supporting

Mentioning

107

Contrasting

Unclassified

Order By: Relevance

“…The various benchmark datasets used in the past decade were WePS-3, 27 SemEval, 30,52,54,55,73,75,76,85 tweets prepared by Stanford University, 34,45,46,75 SNAP, 40 Sanders Twitter Sentiment Corpus (denoted as Sanders), 44,55,75,79 2008 Presidential Debate Corpus, 44,75,79 Sentiment140, 51 RepLab 2012, 53 RepLab 2013, 53 STS-manual, 55 Gold Standard personality labeled Twitter dataset, 59 Cleveland Heart Disease data, 69 STS-Gold, 73 FIGURE 6 Distribution of papers in accordance to the digital libraries (expressed in percentages) Many reported researches were carried on the tweets fetched directly from Twitter using its API. The tweets were from a variety of domains, topics and time period (referred as topic specific/topic oriented tweets).…”

Section: • Widely Used Datasets and Domains In Which The Studies For mentioning

confidence: 99%

“…Accuracy (A) It is defined as proximity of a measurement to its true value. It 17,29,40,43-45,48-50,52,55,57,60,64, is calculated as a proportion of TP and true negatives (TN) 66,[68][69][70][72][73][74][75][77][78][79]85,[81][82][83] among total inspected cases.…”

mentioning

confidence: 99%

“…The SemEval 201554,83 is available at http://alt.qcri.org/semeval2015/task11 and consists of tweets which are enriched with metaphors and ironical content. Similarly, SemEval series provides SemEval 201685 and SemEval 2017 85 set of tweets as well belonging to varied topics such as consumer products, environmental issues, etc. Another benchmark dataset34,45,46,75 SNAP dataset 40 is a collection of 1,600,000 positive and negative tweets prepared by the ''Stanford University'' which focus on the ''general purpose smiley tweets.''.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

Kumar

Jaiswal

2019

Concurrency and Computation

131

View full text Add to dashboard Cite

Sentiment detection and classification is the latest fad for social analytics on Web. With the array of practical applications in healthcare, finance, media, consumer markets, and government, distilling the voice of public to gain insight to target information and reviews is non-trivial. With a marked increase in the size, subjectivity, and diversity of social web-data, the vagueness, uncertainty and imprecision within the information has increased manifold. Soft computing techniques have been used to handle this fuzziness in practical applications. This work is a study to understand the feasibility, scope and relevance of this alliance of using Soft computing techniques for sentiment analysis on Twitter. We present a systematic literature review to collate, explore, understand and analyze the efforts and trends in a well-structured manner to identify research gaps defining the future prospects of this coupling. The contribution of this paper is significant because firstly the primary focus is to study and evaluate the use of soft computing techniques for sentiment analysis on Twitter and secondly as compared to the previous reviews we adopt a systematic approach to identify, gather empirical evidence, interpret results, critically analyze, and integrate the findings of all relevant high-quality studies to address specific research questions pertaining to the defined research domain. KEYWORDS machine learning, review, sentiment analysis, soft computing, Twitter INTRODUCTIONThe incessantly evolving dynamics of the Web in terms of the volume, velocity and variety of opinion-rich information accessible online, has made research in the domain of Sentiment Analysis (SA) a trend for many practical applications which facilitate decision support and deliver targeted information to domain analysts. Interestingly, the buzzing term ''big data'' which is estimated to be 90% unstructured 1 further makes it crucial to tap and analyze information using contemporary tools. Text mining models define the process to transform and substitute this unstructured data into a structured one for knowledge discovery. Use of classification algorithms to intelligently mine text has been studied extensively across literature. 2,3 SA, established as a typical text classification task, 4 is defined as the computational study of people's opinions, attitudes and emotions towards an entity. 5,6 It offers a technology-based solution to understand people's reactions, views and opinion polarities (positive, negative or neutral) in textual content available over social media sources.Research studies and practical applications in the field of SA have escalated in the past decade with the transformation and expansion of Web from passive provider of content to an active socially-aware distributor of collective intelligence. This new collaborative Web (called Web 2.0), 7 extended by Web-based technologies like comments, blogs and wikis, social media portals like Twitter or Facebook, that allow to build social networks based on professional relationship, i...

show abstract

Section: • Widely Used Datasets and Domains In Which The Studies For mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

Kumar

Jaiswal

2019

Concurrency and Computation

131

View full text Add to dashboard Cite

show abstract

“…1. Pré-processamento: é o procedimento de limpar e preparar textos que serão classificados [16]. Ele também visa reduzir o volume de dados [12,17].…”

Section: Análise De Sentimentosunclassified

“…Algumas das técnicas de pré-processamento incluem remover símbolos e caracteres não textuais (característicos de textos não estruturados), expandir abreviações, substituir contrações, remover números, remover stopwords (preposições, artigos e conectivos que servem para ligar palavras a outra e não dão sentido na frase [17]) e reduzir a palavra ao radical (stemming) [18], diminuindo assim as variações da mesma palavra (plural, gerúndio, verbos, flexionados, aumentativo, diminutivo, substantivos, entre outros). [20,9], Linear discriminant analysis (LDA) [9,21], Naïve Bayes (NB) [16,22,23], Random Forest [24,25], Vizinhos mais próximos (KNN) [22], Multi-layer Perceptron (MLP) [16,26,13]. Sohrabi e Hemmatian [12] propuseram um sistema que utiliza SVM e RNA para reconhecimento de polaridade.…”

Section: Análise De Sentimentosunclassified

Identificação de Polaridade de Sentimento no Twitter Aplicada à Indústria Calçadista

Silva¹,

Adami²

2019

SCI

View full text Add to dashboard Cite

There are several works reported in the scientific literature on sentiment classification, with the extraction of messages from the Twitter platform. However, no work was found specifically focused on the Portuguese language for the footwear area. The article shows how it is possible to recognize consumer opinion (positive or negative) from tweets regarding the footwear industry, using machine learning. A footwear company from southern Brazil was used for evaluation. We collected texts from Twitter, which had the preprocessing process with the cleaning of irrelevant terms, the extraction of characteristics to obtain measurements and the differentiation of polarity. And finally, the identification of which class or example under analysis belongs to the use of classifiers for polarity recognition. The classifiers used were Support Vector Machines (SVM), Multi-Layer Perceptron (MLP), Random Forest, Nearest Neighbors (KNN) and Linear Discriminant Analysis (LDA). The results showed that the best classifier for this type of problem was the MLP. The results with the MLP classifier obtained specificity of 78.5%, sensitivity of 95.6% and an accuracy of 86.0%.

show abstract

From Social Media to Expert Reports: The Impact of Source Selection on Automatically Validating Complex Conceptual Models of Obesity

Sandhu

Giabbanelli

Mago

2019

Social Computing and Social Media. Design, Human Behavior and Analytics

View full text Add to dashboard Cite

A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis

Cited by 180 publications

References 24 publications

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

Identificação de Polaridade de Sentimento no Twitter Aplicada à Indústria Calçadista

From Social Media to Expert Reports: The Impact of Source Selection on Automatically Validating Complex Conceptual Models of Obesity

Contact Info

Product

Resources

About