Purpose The main purpose of this paper is to analyze a tourist destination using sentiment analysis techniques with data from Twitter and Instagram to find the most representative entities (or places) and perceptions (or aspects) of the users. Design/methodology/approach The authors used 90,725 Instagram posts and 235,755 Twitter tweets to analyze tourism in Granada (Spain) to identify the important places and perceptions mentioned by travelers on both social media sites. The authors used several approaches for sentiment classification for English and Spanish texts, including deep learning models. Findings The best results in a test set were obtained using a bidirectional encoder representations from transformers (BERT) model for Spanish texts and Tweeteval for English texts, and these were subsequently used to analyze the data sets. It was then possible to identify the most important entities and aspects, and this, in turn, provided interesting insights for researchers, practitioners, travelers and tourism managers so that services could be improved and better marketing strategies formulated. Research limitations/implications The authors propose a Spanish-Tourism-BERT model for performing sentiment classification together with a process to find places through hashtags and to reveal the important negative aspects of each place. Practical implications The study enables managers and practitioners to implement the Spanish-BERT model with our Spanish Tourism data set that the authors released for adoption in applications to find both positive and negative perceptions. Originality/value This study presents a novel approach on how to apply sentiment analysis in the tourism domain. First, the way to evaluate the different existing models and tools is presented; second, a model is trained using BERT (deep learning model); third, an approach of how to identify the acceptance of the places of a destination through hashtags is presented and, finally, the evaluation of why the users express positivity (negativity) through the identification of entities and aspects.
PurposeThe main aim of this paper is to build an approach to analyze the tourist content posted on social media. The approach incorporates information extraction, cleaning, data processing, descriptive and content analysis and can be used on different social media platforms such as Instagram, Facebook, etc. This work proposes an approach to social media analytics in traveler-generated content (TGC), and the authors use Twitter to apply this study and examine data about the city and the province of Granada.Design/methodology/approachIn order to identify what people are talking and posting on social media about places, events, restaurants, hotels, etc. the authors propose the following approach for data collection, cleaning and data analysis. The authors first identify the main keywords for the place of study. A descriptive analysis is subsequently performed, and this includes post metrics with geo-tagged analysis and user metrics, retweets and likes, comments, videos, photos and followers. The text is then cleaned. Finally, content analysis is conducted, and this includes word frequency calculation, sentiment and emotion detection and word clouds. Topic modeling was also performed with latent Dirichlet association (LDA).FindingsThe authors used the framework to collect 262,859 tweets about Granada. The most important hashtags are #Alhambra and #SierraNevada, and the most prolific user is @AlhambraCultura. The approach uses a seasonal context, and the posted tweets are divided into two periods (spring–summer and autumn–winter). Word frequency was calculated and again Granada, Alhambra are the most frequent words in both periods in English and Spanish. The topic models show the subjects that are mentioned in both languages, and although there are certain small differences in terms of language and season, the Alhambra, Sierra Nevada and gastronomy stand out as the most important topics.Research limitations/implicationsExtremely difficult to identify sarcasm, posts may be ambiguous, users may use both Spanish and English words in their tweets and tweets may contain spelling mistakes, colloquialisms or even abbreviations. Multilingualism represents also an important limitation since it is not clear how tweets written in different languages should be processed. The size of the data set is also an important factor since the greater the amount of data, the better the results. One of the largest limitations is the small number of geo-tagged tweets as geo-tagging would provide information about the place where the tweet was posted and opinions of it.Originality/valueThis study proposes an interesting way to analyze social media data, bridging tourism and social media literature in the data analysis context and contributes to discover patterns and features of the tourism destination through social media. The approach used provides the prospective traveler with an overview of the most popular places and the major posters for a particular tourist destination. From a business perspective, it informs managers of the most influential users, and the information obtained can be extremely useful for managing their tourism products in that region.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.