Effective Text Data Preprocessing Technique for Sentiment Analysis in Social Media Data

Pradha, Saurav; Halgamuge, Malka N.; Vinh, Nguyễn Trần Quốc

doi:10.1109/kse.2019.8919368

Cited by 77 publications

(37 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we reviewed and selected a text data preprocessing technique. Ten preprocessing techniques are frequently used [ 33 ]. The data preprocessing has four steps: data cleansing, similar word matching, stop word removal, and tokenization.…”

Section: Methodsmentioning

confidence: 99%

“…A stop word removal step was then performed. Stop words are common words with no semantics and do not aggregate relevant information to the task, such as “the” and “a” [ 33 ]. Lastly, the tokenization step divides each accident situation description sentence into token units, which are small chunks such as words and attached parts of speech.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Scenario-Mining for Level 4 Automated Vehicle Safety Assessment from Real Accident Situations in Urban Areas Using a Natural Language Process

Park

Jeong

et al. 2021

Sensors

View full text Add to dashboard Cite

As the research and development activities of automated vehicles have been active in recent years, developing test scenarios and methods has become necessary to evaluate and ensure their safety. Based on the current context, this study developed an automated vehicle test scenario derivation methodology using traffic accident data and a natural language processing technique. The natural language processing technique-based test scenario mining methodology generated 16 functional test scenarios for urban arterials and 38 scenarios for intersections in urban areas. The proposed methodology was validated by determining the number of traffic accident records that can be explained by the resulting test scenarios. That is, the resulting test scenarios are valid and represent a matching rate between the test scenarios and the increased number of traffic accident records. The resulting functional scenarios generated by the proposed methodology account for 43.69% and 27.63% of the actual traffic accidents for urban arterial and intersection scenarios, respectively.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Scenario-Mining for Level 4 Automated Vehicle Safety Assessment from Real Accident Situations in Urban Areas Using a Natural Language Process

Park

Jeong

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…The authors (Jianqiang & Xiaolin, 2017) have conducted experiments to prove that the use of text preprocessing techniques results in better accuracy for Twitter sentiment analysis. The concept of lemmatization and stemming was jointly used by the authors (Pradha, Halgamuge, & Tran Quoc Vinh, 2019) on the Twitter dataset to perform text-based sentiment analysis.…”

Section: Preprocessing Of Textmentioning

confidence: 99%

Multimodal sentimental analysis for social media applications: A comprehensive review

Chandrasekaran

Nguyen

Hemanth

2021

WIREs Data Min & Knowl

View full text Add to dashboard Cite

The analysis of sentiments is essential in identifying and classifying opinions regarding a source material that is, a product or service. The analysis of these sentiments finds a variety of applications like product reviews, opinion polls, movie reviews on YouTube, news video analysis, and health care applications including stress and depression analysis. The traditional approach of sentiment analysis which is based on text involves the collection of large textual data and different algorithms to extract the sentiment information from it. But multimodal sentimental analysis provides methods to carry out opinion analysis based on the combination of video, audio, and text which goes a way beyond the conventional text-based sentimental analysis in understanding human behaviors. The remarkable increase in the use of social media provides a large collection of multimodal data that reflects the user's sentiment on certain aspects. This multimodal sentimental analysis approach helps in classifying the polarity (positive, negative, and neutral) of the individual sentiments. Our work aims to present a survey of recent developments in analyzing the multimodal sentiments (involving text, audio, and video/image) which involve humanmachine interaction and challenges involved in analyzing them. A detailed survey on sentimental dataset, feature extraction algorithms, data fusion methods, and efficiency of different classification techniques are presented in this work.

show abstract

“…Furthermore, Pradha et al [ 71 ] proposed an effective technique for pre-processing text data and developed an algorithm to train Support Vector Machine (SVM), Deep Learning (DL) and Naïve Bayes (NB) classifiers for processing Twitter data, developing an algorithm to weight the feeling evaluation in relation to the weight of the hashtag and clean text. Sohrabi and Hemmatian [ 72 ] presented an efficient pre-processing method for opinion mining, testing it on Twitter user comments, and demonstrated how its use in combination with SVM and ANNs achieves the highest accuracy scores compared to other methods.…”

Section: Background and Related Workmentioning

confidence: 99%

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Pota

Ventura

Catelli

et al. 2020

Sensors

View full text Add to dashboard Cite

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.

show abstract

Effective Text Data Preprocessing Technique for Sentiment Analysis in Social Media Data

Cited by 77 publications

References 8 publications

Scenario-Mining for Level 4 Automated Vehicle Safety Assessment from Real Accident Situations in Urban Areas Using a Natural Language Process

Scenario-Mining for Level 4 Automated Vehicle Safety Assessment from Real Accident Situations in Urban Areas Using a Natural Language Process

Multimodal sentimental analysis for social media applications: A comprehensive review

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Contact Info

Product

Resources

About