Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a spatial blocking technique (QuadFlex) that creates blocks of nearby spatial entities with the time complexity of the quadtree algorithm. After pairwise comparing the spatial entities in the same block, we propose the SkyRank algorithm that ranks the compared pairs using Pareto optimality. We introduce the SkyEx-* family of algorithms that can classify the pairs with 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, our fully unsupervised algorithm SkyEx-D approximates the optimal result with an F-measure loss of just 0.01. Finally, QuadSky provides the best trade-off between precision and recall and the best F-measure compared to the existing baselines.
Abstract. Twitter is a social network that provides a powerful source of data. The analysis of those data offers many challenges among those stands out the opportunity to find the reputation of a product, of a person, or of any other entity of interest. Several tools for sentiment analysis have been built in order to calculate the general opinion of an entity using a static analysis of the sentiments expressed in tweets. However, entities are not static; they collaborate with other entities and get involved in events. A simple aggregation of sentiments is then not sufficient to represent this dynamism. In this paper, we present a new approach that identifies the reputation of an entity on the basis of the set of events it is involved into by providing a transparent and self explanatory way for interpreting reputation. In order to perform this analysis we define a new sampling method based on a tweet weighting to retrieve relevant information. In our experiments we show that the 90% of the reputation of the entity originates from the events it is involved into, especially in the case of entities that represent public figures.
Geo-social data has been an attractive source for a variety of problems such as mining mobility patterns, link prediction, location recommendation, and influence maximization. However, new geo-social data is increasingly unavailable and suffers several limitations. In this paper, we aim to remedy the problem of effective data extraction from geo-social data sources. We first identify and categorize the limitations of extracting geo-social data. In order to overcome the limitations, we propose a novel seed-driven approach that uses the points of one source as the seed to feed as queries for the others. We additionally handle differences between, and dynamics within the sources by proposing three variants for optimizing search radius. Furthermore, we provide an optimization based on recursive clustering to minimize the number of requests and an adaptive procedure to learn the specific data distribution of each source. Our comprehensive experiments with six popular sources show that our seed-driven approach yields 14.3 times more data overall, while our request-optimized algorithm retrieves up to 95% of the data with less than 16% of the requests. Thus, our proposed seed-driven approach set new standards for effective and efficient extraction of geo-social data.
Twitter is a social network that provides a powerful source of data. The analysis of those data offers many challenges among those stands out the opportunity to find reputation of a product, a person or any other entity of interest. Several approaches for sentiment analysis have been proposed in the literature to assess the general opinion expressed in tweets on an entity. Nevertheless, these methods aggregate sentiment scores retrieved from tweets, which is a static view to evaluate the overall reputation of an entity. The reputation of an entity is not static; entities collaborate with each other, and they get involved in different events over time. A simple aggregation of sentiment scores is then not sufficient to represent this dynamism. In this paper, we present a new approach to determine the reputation of an entity on the basis of the set of events in which it is involved. To achieve this, we propose a new sampling method driven by a tweet weighting measure to give a better quality and summary of the target entity. We introduce the concept of Frequent Named Entities to determine the events involving the target entity. Our evaluation achieved for different entities shows that 90% of the reputation of an entity originates from the events it is involved in and the breakdown into events allows interpreting the reputation in a transparent and self-explanatory way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.