In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invariant and it does not allow one to differentiate between different activities. Our goal in this paper is to present a new methodology for identifying clusters, that can be applied to different empirical settings. We use a graph approach based on k-shell decomposition to analyze world biomedical research clusters based on PubMed scientific publications. We identify research institutions and locate their activities in geographical clusters. Leading areas of scientific production and their top performing research institutions are consistently identified at different geographic scales.
In this paper, we are interested in understanding the interrelationships between mainstream and social media in forming public opinion during mass crises, specifically in regards to how events are framed in the mainstream news and on social networks and to how the language used in those frames may allow to infer political slant and partisanship. We study the lingual choices for political agenda setting in mainstream and social media by analyzing a dataset of more than 40M tweets and more than 4M news articles from the mass protests in Ukraine during 2013-2014 known as "Euromaidan" and the post-Euromaidan conflict between Russian, pro-Russian and Ukrainian forces in eastern Ukraine and Crimea. We design a natural language processing algorithm to analyze at scale the linguistic markers which point to a particular political leaning in online media and show that political slant in news articles and Twitter posts can be inferred with a high level of accuracy. These findings allow us to better understand the dynamics of partisan opinion formation during mass crises and the interplay between mainstream and social media in such circumstances.
The problem of identifying the optimal location for a new retail store has been the focus of past research, especially in the field of land economy, due to its importance in the success of a business. Traditional approaches to the problem have factored in demographics, revenue and aggregated human flow statistics from nearby or remote areas. However, the acquisition of relevant data is usually expensive. With the growth of location-based social networks, fine grained data describing user mobility and popularity of places has recently become attainable.In this paper we study the predictive power of various machine learning features on the popularity of retail stores in the city through the use of a dataset collected from Foursquare in New York. The features we mine are based on two general signals: geographic, where features are formulated according to the types and density of nearby places, and user mobility, which includes transitions between venues or the incoming flow of mobile users from distant areas. Our evaluation suggests that the best performing features are common across the three different commercial chains considered in the analysis, although variations may exist too, as explained by heterogeneities in the way retail facilities attract users. We also show that performance improves significantly when combining multiple features in supervised learning algorithms, suggesting that the retail success of a business may depend on multiple factors.
In search of scalable solutions, CDNs are exploring P2P support. However, the benefits of peer assistance can be limited by various obstacle factors such as ISP friendlinessrequiring peers to be within the same ISP, bitrate stratificationthe need to match peers with others needing similar bitrate, and partial participation-some peers choosing not to redistribute content.This work relates potential gains from peer assistance to the average number of users in a swarm, its capacity, and empirically studies the effects of these obstacle factors at scale, using a monthlong trace of over 2 million users in London accessing BBC shows online. Results indicate that even when P2P swarms are localised within ISPs, up to 88% of traffic can be saved. Surprisingly, bitrate stratification results in 2 large sub-swarms and does not significantly affect savings. However, partial participation, and the need for a minimum swarm size do affect gains. We investigate improvements to gain from increasing content availability through two well-studied techniques: content bundlingcombining multiple items to increase availability, and historical caching of previously watched items. Bundling proves ineffective as increased server traffic from larger bundles outweighs benefits of availability, but simple caching can considerably boost traffic gains from peer assistance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.