Given their importance in shaping social networks and determining how information or transmissible diseases propagate in a population, interactions between individuals are the subject of many data collection efforts. To this aim, different methods are commonly used, ranging from diaries and surveys to decentralised infrastructures based on wearable sensors. These methods have each advantages and limitations but are rarely compared in a given setting. Moreover, as surveys targeting friendship relations might suffer less from memory biases than contact diaries, it is interesting to explore how actual contact patterns occurring in day-to-day life compare with friendship relations and with online social links. Here we make progresses in these directions by leveraging data collected in a French high school and concerning (i) face-to-face contacts measured by two concurrent methods, namely wearable sensors and contact diaries, (ii) self-reported friendship surveys, and (iii) online social links. We compare the resulting data sets and find that most short contacts are not reported in diaries while long contacts have a large reporting probability, and that the durations of contacts tend to be overestimated in the diaries. Moreover, measured contacts corresponding to reported friendship can have durations of any length but all long contacts do correspond to a reported friendship. On the contrary, online links that are not also reported in the friendship survey correspond to short face-to-face contacts, highlighting the difference of nature between reported friendships and online links. Diaries and surveys suffer moreover from a low sampling rate, as many students did not fill them, showing that the sensor-based platform had a higher acceptability. We also show that, despite the biases of diaries and surveys, the overall structure of the contact network, as quantified by the mixing patterns between classes, is correctly captured by both networks of self-reported contacts and of friendships, and we investigate the correlations between the number of neighbors of individuals in the three networks. Overall, diaries and surveys tend to yield a correct picture of the global structural organization of the contact network, albeit with much less links, and give access to a sort of backbone of the contact network corresponding to the strongest links, i.e., the contacts of longest cumulative durations.
Network topology plays a key role in many phenomena, from the spreading of diseases to that of financial crises. Whenever the whole structure of a network is unknown, one must resort to reconstruction methods that identify the least biased ensemble of networks consistent with the partial information available. A challenging case, frequently encountered due to privacy issues in the analysis of interbank flows and Big Data, is when there is only local (node-specific) aggregate information available. For binary networks, the relevant ensemble is one where the degree (number of links) of each node is constrained to its observed value. However, for weighted networks the problem is much more complicated. While the naïve approach prescribes to constrain the strengths (total link weights) of all nodes, recent counter-intuitive results suggest that in weighted networks the degrees are often more informative than the strengths. This implies that the reconstruction of weighted networks would be significantly enhanced by the specification of both strengths and degrees, a computationally hard and bias-prone procedure. Here we solve this problem by introducing an analytical and unbiased maximum-entropy method that works in the shortest possible time and does not require the explicit generation of reconstructed samples. We consider several real-world examples and show that, while the strengths alone give poor results, the additional knowledge of the degrees yields Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. accurately reconstructed networks. Information-theoretic criteria rigorously confirm that the degree sequence, as soon as it is non-trivial, is irreducible to the strength sequence. Our results have strong implications for the analysis of motifs and communities and whenever the reconstructed ensemble is required as a null model to detect higher-order patterns.A range of phenomena of critical importance, from the spread of infectious diseases to the diffusion of opinions and the propagation of financial crises, is highly sensitive to the topology of the underlying network that mediates the interactions [1]. This sensitivity implies that, whenever it is not possible to have a complete empirical knowledge of the network, one should make an optimal use of the partial information available and try to reconstruct the most likely network, or rather an ensemble of likely networks, in the least biased way. In the Big Data era, this kind of problem is becoming more and more important given the ever-increasing availability of data that, for privacy issues, are often of aggregate nature [2, 3].Among the possible types of incomplete topological information (e.g. missing links, missing nodes, etc), one of the most frequently encountered situations is when only a local knowledge of the network is available [6][7][8][9][10][11]. For instance, in binary n...
Sampling random graphs with given properties is a key step in the analysis of networks, as random ensembles represent basic null models required to identify patterns such as communities and motifs. An important requirement is that the sampling process is unbiased and efficient. The main approaches are microcanonical, i.e. they sample graphs that match the enforced constraints exactly. Unfortunately, when applied to strongly heterogeneous networks (like most real-world examples), the majority of these approaches become biased and/or time-consuming. Moreover, the algorithms defined in the simplest cases, such as binary graphs with given degrees, are not easily generalizable to more complicated ensembles. Here we propose a solution to the problem via the introduction of a 'Maximize and Sample' ('Max & Sam' for short) method to correctly sample ensembles of networks where the constraints are 'soft', i.e. realized as ensemble averages. Our method is based on exact maximum-entropy distributions and is therefore unbiased by construction, even for strongly heterogeneous networks. It is also more computationally efficient than most microcanonical alternatives. Finally, it works for both binary and weighted networks with a variety of constraints, including combined degree-strength sequences and full reciprocity structure, for which no alternative method exists. Our canonical approach can in principle be turned into an unbiased microcanonical one, via a restriction to the relevant subset. Importantly, the analysis of the fluctuations of the constraints suggests that the microcanonical and canonical versions of all the ensembles considered here are not equivalent. We show various real-world applications and provide a code implementing all our algorithms. Unfortunately, given the strong heterogeneity of nodes (e.g. the power-law distribution of vertex degrees), the solution to the above problem is not simple. This is most easily explained in the case of binary graphs, even if similar arguments apply to weighted networks as well. For simple graphs, the most important null model is the (undirected binary) configuration model (UBCM), defined as an ensemble of networks where the degree of each node is specified, and the rest of the topology is maximally random [8][9][10]. Since the degrees of all nodes (the socalled degree sequence) act as constraints, 'maximally random' does not mean 'completely random': in order to realize the degree sequence, interdependencies among vertices necessarily arise. These interdependencies affect other topological properties as well. So, even if the degree sequence is the only quantity that is enforced 'on purpose', other structural properties are unavoidably constrained as well. These higher-order effects are called 'structural correlations'. In order to disentangle spurious structural correlations from genuine correlations of interest, it is very important to properly implement the UBCM in such a way that it takes the observed degree sequence as input and generates expectations based on a uniform ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.