Edges in a network can be divided into two kinds according to their different roles: some enhance the locality like the ones inside a cluster while others contribute to the global connectivity like the ones connecting two clusters. A recent study by Onnela et al uncovered the weak ties effects in mobile communication. In this paper, we provide complementary results on document networks, that is, the edges connecting less similar nodes in content are more significant in maintaining the global connectivity. We propose an index called bridgeness to quantify the edge significance in maintaining connectivity, which only depends on local information of the network topology. We compare the bridgeness with content similarity and some other structural indices according to an edge percolation process. Experimental results on document networks show that the bridgeness outperforms content similarity in characterizing the edge significance. Furthermore, extensive numerical results on disparate networks indicate that the bridgeness is also better than some well-known indices on edge
Document networks are characteristic in that a document node, e.g. a webpage or an article, carries meaningful content. Properties of document networks are not only affected by topological connectivity between nodes, but also strongly influenced by the semantic relation between content of the nodes. We observe that document networks have a large number of triangles and a high value of clustering coefficient. And there is a strong correlation between the probability of formation of a triangle and the content similarity among the three nodes involved. We propose the degree-similarity product (DSP) model which well reproduces these properties. The model achieves this by using a preferential attachment mechanism which favours the linkage between nodes that are both popular and similar. This work is a step forward towards a better understanding of the structure and evolution of document networks.
For the study of citation networks, a challenging problem is modeling the
high clustering. Existing studies indicate that the promising way to model the
high clustering is a copying strategy, i.e., a paper copies the references of
its neighbour as its own references. However, the line of models highly
underestimates the number of abundant triangles observed in real citation
networks and thus cannot well model the high clustering. In this paper, we
point out that the failure of existing models lies in that they do not capture
the connecting patterns among existing papers. By leveraging the knowledge
indicated by such connecting patterns, we further propose a new model for the
high clustering in citation networks. Experiments on two real world citation
networks, respectively from a special research area and a multidisciplinary
research area, demonstrate that our model can reproduce not only the power-law
degree distribution as traditional models but also the number of triangles, the
high clustering coefficient and the size distribution of co-citation clusters
as observed in these real networks
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.