Twitter is used by millions of users to publish brief messages (tweets) with the purpose of sharing experiences and/or opinions about a product or service. There is a clear need for systems that can mine these messages in order to derive information about the collective thinking of twitterers (e.g. for opinion or sentiment analysis). Tweet analysis is a very important task because comments, opinions, suggestions, complaints etc. can be used for marketing strategies or for determining information on a company's reputation. For this purpose, it is necessary to automatically establish whether a tweet refers to a company or not, when the company name is ambiguous. This task is not a straightforward keyword search process as there may be multiple contexts in which a name can be used. The aim of this study is to present and compare four different approaches which improve the representation of short texts for better performance of the clustering task that determine whether a given tweet refers to a particular company
INTRODUCTIONTwitter 1 -the microblog platform that allows users to publish brief messages of less than 140 characters-is a Web 2.0 application which offers a new mode of user interaction. It has become an important channel through which users can share their experiences or opinions about a product, service or company, and companies are taking advantage of this medium as part of their marketing strategies. It has been estimated by Complete 2 that the use of Twitter has been drastically increased from 2009 to 2012, reaching up to 45 million unique visitors; however the increase in 2012 was not as significant as in previous years. In 2012 and 2013 Twitter has been or not. For this purpose, we have used a variety of enriching methodologies based on term expansion via the semantic similarity hidden behind the lexical structure, in order to improve the representation of tweets and as a consequence the performance of the task. We have used two different tweet datasets of company names which contain different levels of ambiguity. The results are promising although they highlight the difficulty of this task.Key Words: Clustering of tweets, opinion analysis, disambiguation, online reputation management.
ResumenTwitter es utilizado por millones de personas con la finalidad de publicar mensajes cortos con el propósito de compartir experiencias y/u opiniones acerca de un determinado producto o servicio. Existe una clara necesidad de crear sistemas que sean capaces de analizar estos mensajes a fin de derivar información sobre el pensamiento colectivo de las personas que los publican. El análisis de los tweets se ha convertido en una tarea muy importante para las grandes compañías, debido a que los comentarios, sugerencias y quejas pueden ser usados como estrategias de mercadotecnia o para determinar la reputación de cierta compañía. Entre otras tareas, es necesario construir métodos que permitan determinar, de forma automática, cuando un tweet se refiere a una compañía o no, en el caso de que el nombre de la co...