Purpose – The purpose of this paper is to assess the reliability of numerical ratings of hotels calculated by three sentiment analysis algorithms. Design/methodology/approach – More than one million reviews and numerical ratings of hotels in seven cities in four countries were extracted from TripAdvisor web site. Reviews were classified as positive or negative using three sentiment analysis tools. The percentage of positive reviews was used to predict numerical ratings that were then compared with actual ratings. Findings – All tools classified reviews as positive or negative in a way that correlated positively with numerical ratings. More complex algorithms worked better, yet predicted ratings showed reasonable agreement with actual ratings for most cities. Predictions for hotels were less reliable if based on less than 50-60 percent of available reviews. Practical implications – These results validate that sentiment analysis can be used to transform unstructured qualitative data on user opinion into quantitative ratings. Current tools may be useful for summarizing opinions of user reviews of products and services on web sites that do not require users to post numerical ratings such as traveler forums. This summarizing may be valuable not just to potential users, but also to the service and product providers and offers validation and benchmarking for future improvement of opinion mining and prediction techniques. Originality/value – This work assesses the correlation between sentiment analysis of hotels’ reviews and their actual ratings. The authors also evaluated the reliability of results of sentiment analysis calculated by three different algorithms.
Purpose This paper reports on a quantitative study of data gathered from the Linked Open Vocabularies (LOV) catalogue, including the use of network analysis and metrics. The purpose of this paper is to gain insights into the structure of LOV and the use of vocabularies in the Web of Data. It is important to note that not all the vocabularies in it are registered in LOV. Given the de-centralised and collaborative nature of the use and adoption of these vocabularies, the results of the study can be used to identify emergent important vocabularies that are shaping the Web of Data. Design/methodology/approach The methodology is based on an analytical approach to a data set that captures a complete snapshot of the LOV catalogue dated April 2014. An initial analysis of the data is presented in order to obtain insights into the characteristics of the vocabularies found in LOV. This is followed by an analysis of the use of Vocabulary of a Friend properties that describe relations among vocabularies. Finally, the study is complemented with an analysis of the usage of the different vocabularies, and concludes by proposing a number of metrics. Findings The most relevant insight is that unsurprisingly the vocabularies with more presence are those used to model Semantic Web data, such as Resource Description Framework, RDF Schema and OWL, as well as broadly used standards as Simple Knowledge Organization System, DCTERMS and DCE. It was also discovered that the most used language is English and the vocabularies are not considered to be highly specialised in a field. Also, there is not a dominant scope of the vocabularies. Regarding the structural analysis, it is concluded that LOV is a heterogeneous network. Originality/value The paper provides an empirical analysis of the structure of LOV and the relations between its vocabularies, together with some metrics that may be of help to determine the important vocabularies from a practical perspective. The results are of interest for a better understanding of the evolution and dynamics of the Web of Data, and for applications that attempt to retrieve data in the Linked Data Cloud. These applications can benefit from the insights into the important vocabularies to be supported and the value added when mapping between and using the vocabularies.
For AuthorsIf you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service information about how to choose which publication to write for and submission guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information. About Emerald www.emeraldinsight.comEmerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online products and additional customer resources and services.Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation.*Related content and download information correct at time of download. Purpose -The common understanding of generalization/specialization relations assumes the relation to be equally strong between a classifier and any of its related classifiers and also at every level of the hierarchy. Assigning a grade of relative distance to represent the level of similarity between the related pairs of classifiers could correct this situation, which has been considered as an oversimplification of the psychological account of the real-world relations. The paper aims to discuss these issues. Design/methodology/approach -The evaluation followed an end-user perspective. In order to obtain a consistent data set of specialization distances, a group of 21 persons was asked to assign values to a set of relations from a selection of terms from the AGROVOC thesaurus. Then two sets of representations of the relations between the terms were built, one according to the calculated concept of specialization weights and the other one following the original order of the thesaurus. In total, 40 persons were asked to choose between the two sets following an A/B test-like experiment. Finally, short interviews were carried out after the test to inquiry about their decisions. Findings -The results show that the use of this information could be a valuable tool for search and information retrieval purposes and for the visual representation of knowledge organization systems (KOS). Furthermore, the methodology followed in the study turned out to be useful for detecting inconsistencies in the thesaurus and could thus be used for quality control and optimization of the hierarchical relations. Originality/value -The use of this relative distance information, namely, "concept specialization distance," has been proposed mainly at a theoretical level. In the current experiment, the authors evaluate the potential use of this information from an end-user perspective, not only for text-based interfaces but also its application for the visual representation of KOS. Finally, the methodology followed for the elaboration of the concept specialization distance data set showed potential for detecting possible inconsistencies in...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.