Josep M. Pujol scite author profile

An increasing fraction of today's social interactions occur using online social media as communication channels. Recent worldwide events, such as social movements in Spain or revolts in the Middle East, highlight their capacity to boost people's coordination. Online networks display in general a rich internal structure where users can choose among different types and intensity of interactions. Despite this, there are still open questions regarding the social value of online interactions. For example, the existence of users with millions of online friends sheds doubts on the relevance of these relations. In this work, we focus on Twitter, one of the most popular online social networks, and find that the network formed by the basic type of connections is organized in groups. The activity of the users conforms to the landscape determined by such groups. Furthermore, Twitter's distinction between different types of interactions allows us to establish a parallelism between online and offline social networks: personal interactions are more likely to occur on internal links to the groups (the weakness of strong ties); events transmitting new information go preferentially through links connecting different groups (the strength of weak ties) or even more through links connecting to users belonging to several groups that act as brokers (the strength of intermediary ties).

show abstract

The Little Engine(s) That Could: Scaling Online Social Networks

Pujol

Erramilli

Siganos

et al. 2012

IEEE/ACM Trans. Networking

104

171

View full text Add to dashboard Cite

The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers -e.g. using DHTs -can lead to costly inter-server communication.We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost.We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.

show abstract

I Like It... I Like It Not: Evaluating User Ratings Noise in Recommender Systems

2009

View full text Add to dashboard Cite

Recent growing interest in predicting and influencing consumer behavior has generated a parallel increase in research efforts on Recommender Systems. Many of the state-of-the-art Recommender Systems algorithms rely on obtaining user ratings in order to later predict unknown ratings. An underlying assumption in this approach is that the user ratings can be treated as ground truth of the user's taste. However, users are inconsistent in giving their feedback, thus introducing an unknown amount of noise that challenges the validity of this assumption.In this paper, we tackle the problem of analyzing and characterizing the noise in user feedback through ratings of movies. We present a user study aimed at quantifying the noise in user ratings that is due to inconsistencies. We measure RMSE values that range from 0.557 to 0.8156. We also analyze how factors such as item sorting and time of rating affect this noise.

show abstract

Data Mining Methods for Recommender Systems

et al. 2010

View full text Add to dashboard Cite

Tracking the Trackers

Yu¹,

Macbeth²,

Modi³

et al. 2016

View full text Add to dashboard Cite

Online tracking poses a serious privacy challenge that has drawn significant attention in both academia and industry. Existing approaches for preventing user tracking, based on curated blocklists, suffer from limited coverage and coarsegrained resolution for classification, rely on exceptions that impact sites' functionality and appearance, and require significant manual maintenance. In this paper we propose a novel approach, based on the concepts leveraged from k-Anonymity, in which users collectively identify unsafe data elements, which have the potential to identify uniquely an individual user, and remove them from requests. We deployed our system to 200,000 German users running the Cliqz Browser or the Cliqz Firefox extension to evaluate its efficiency and feasibility. Results indicate that our approach achieves better privacy protection than blocklists, as provided by Disconnect, while keeping the site breakage to a minimum, even lower than the community-optimized Ad-Block Plus. We also provide evidence of the prevalence and reach of trackers to over 21 million pages of 350,000 unique sites, the largest scale empirical evaluation to date. 95% of the pages visited contain 3rd party requests to potential trackers and 78% attempt to transfer unsafe data. Tracker organizations are also ranked, showing that a single organization can reach up to 42% of all page visits in Germany.

show abstract

Fair Routing in Delay Tolerant Networks

Pujol

Toledo

Rodríguez

2009

View full text Add to dashboard Cite

Abstract-The typical state-of-the-art routing algorithms for delay tolerant networks are based on best next hop hill-climbing heuristics in order to achieve throughput and efficiency. The combination of these heuristics and the social network structure leads the routing to direct most of the traffic through a small subset of good users. For instance, in the SimBet algorithm, the top 10% of users carry out 54% of all the forwards and 85% of all the handovers. This unfair load distribution is not sustainable as it can quickly deplete constraint resources in heavily utilized mobile devices (e.g. storage, battery, budget, etc.). Moreover, because a small number of users carry a significant amount of the traffic, the system is not robust to random failures and attacks.To overcome these inefficiencies, this paper introduces FairRoute, a routing algorithm for delay tolerant networks inspired by the social processes of perceived interaction strength, where messages are preferably forwarded to users that have a stronger social relation with the target of the message; and assortativity, that limits the exchange of messages to those users with similar "social status". We compare the performance of FairRoute to the state-of-the-art algorithms by extensive simulations on the MIT reality mining dataset. The results show that our algorithm outperforms existing algorithms in the de facto benchmark of throughput vs. forwards. Furthermore, it distributes better the load; the top 10% carry out 26% of the forwards and 28% of the handovers without any loss in performance.

show abstract

The Effect of Ongoing Exposure Dynamics in Dose Response Relationships

et al. 2009

View full text Add to dashboard Cite

Characterizing infectivity as a function of pathogen dose is integral to microbial risk assessment. Dose-response experiments usually administer doses to subjects at one time. Phenomenological models of the resulting data, such as the exponential and the Beta-Poisson models, ignore dose timing and assume independent risks from each pathogen. Real world exposure to pathogens, however, is a sequence of discrete events where concurrent or prior pathogen arrival affects the capacity of immune effectors to engage and kill newly arriving pathogens. We model immune effector and pathogen interactions during the period before infection becomes established in order to capture the dynamics generating dose timing effects. Model analysis reveals an inverse relationship between the time over which exposures accumulate and the risk of infection. Data from one time dose experiments will thus overestimate per pathogen infection risks of real world exposures. For instance, fitting our model to one time dosing data reveals a risk of 0.66 from 313 Cryptosporidium parvum pathogens. When the temporal exposure window is increased 100-fold using the same parameters fitted by our model to the one time dose data, the risk of infection is reduced to 0.09. Confirmation of this risk prediction requires data from experiments administering doses with different timings. Our model demonstrates that dose timing could markedly alter the risks generated by airborne versus fomite transmitted pathogens.

show abstract

Informing Optimal Environmental Influenza Interventions: How the Host, Agent, and Environment Alter Dominant Routes of Transmission

et al. 2010

View full text Add to dashboard Cite

Influenza can be transmitted through respirable (small airborne particles), inspirable (intermediate size), direct-droplet-spray, and contact modes. How these modes are affected by features of the virus strain (infectivity, survivability, transferability, or shedding profiles), host population (behavior, susceptibility, or shedding profiles), and environment (host density, surface area to volume ratios, or host movement patterns) have only recently come under investigation. A discrete-event, continuous-time, stochastic transmission model was constructed to analyze the environmental processes through which a virus passes from one person to another via different transmission modes, and explore which factors increase or decrease different modes of transmission. With the exception of the inspiratory route, each route on its own can cause high transmission in isolation of other modes. Mode-specific transmission was highly sensitive to parameter values. For example, droplet and respirable transmission usually required high host density, while the contact route had no such requirement. Depending on the specific context, one or more modes may be sufficient to cause high transmission, while in other contexts no transmission may result. Because of this, when making intervention decisions that involve blocking environmental pathways, generic recommendations applied indiscriminately may be ineffective; instead intervention choice should be contextualized, depending on the specific features of people, virus strain, or venue in question.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Josep M. Pujol

Social Features of Online Networks: The Strength of Intermediary Ties in Online Social Media

The Little Engine(s) That Could: Scaling Online Social Networks

I Like It... I Like It Not: Evaluating User Ratings Noise in Recommender Systems

Data Mining Methods for Recommender Systems

Tracking the Trackers

Fair Routing in Delay Tolerant Networks

The Effect of Ongoing Exposure Dynamics in Dose Response Relationships

Informing Optimal Environmental Influenza Interventions: How the Host, Agent, and Environment Alter Dominant Routes of Transmission

Contact Info

Product

Resources

About