Statistical physics has proven to be a very fruitful framework to describe phenomena outside the realm of traditional physics. The last years have witnessed the attempt by physicists to study collective phenomena emerging from the interactions of individuals as elementary units in social structures. Here we review the state of the art by focusing on a wide list of topics ranging from opinion, cultural and language dynamics to crowd behavior, hierarchy formation, human dynamics, social spreading. We highlight the connections between these problems and other, more traditional, topics of statistical physics. We also emphasize the comparison of model results with empirical data from social systems.
The investigation of community structures in networks is an important issue in many domains and disciplines. This problem is relevant for social tasks (objective analysis of relationships on the web), biological inquiries (functional studies in metabolic and protein networks), or technological problems (optimization of large infrastructures). Several types of algorithms exist for revealing the community structure in networks, but a general and quantitative definition of community is not implemented in the algorithms, leading to an intrinsic difficulty in the interpretation of the results without any additional nontopological information. In this article we deal with this problem by showing how quantitative definitions of community are implemented in practice in the existing algorithms. In this way the algorithms for the identification of the community structure become fully self-contained. Furthermore, we propose a local algorithm to detect communities which outperforms the existing algorithms with respect to computational cost, keeping the same level of reliability. The algorithm is tested on artificial and real-world graphs. In particular, we show how the algorithm applies to a network of scientific collaborations, which, for its size, cannot be attacked with the usual methods. This type of local algorithm could open the way to applications to large-scale technological and biological systems.
What processes can explain how very large populations are able to converge on the use of a particular word or grammatical construction without global coordination? Answering this question helps to understand why new language constructs usually propagate along an S-shaped curve with a rather sudden transition towards global agreement. It also helps to analyze and design new technologies that support or orchestrate self-organizing communication systems, such as recent social tagging systems for the web. The article introduces and studies a microscopic model of communicating autonomous agents performing language games without any central control. We show that the system undergoes a disorder/order transition, going trough a sharp symmetry breaking process to reach a shared set of conventions. Before the transition, the system builds up non-trivial scale-invariant correlations, for instance in the distribution of competing synonyms, which display a Zipf-like law. These correlations make the system ready for the transition towards shared conventions, which, observed on the time-scale of collective behaviors, becomes sharper and sharper with system size. This surprising result not only explains why human language can scale up to very large populations but also suggests ways to optimize artificial semiotic dynamics.
Can one construct a thermodynamics for compact, slowly moving powders and grains? A few years ago, Edwards proposed a possible step in this direction, raising the fascinating perspective that such systems have a statistical mechanics of their own, different from that of Maxwell, Boltzmann, and Gibbs, allowing us to have some information while still ignoring dynamic details. Recent developments in the theory of glasses have come to confirm these ideas within mean field. In order to go beyond, we explicitly generate Edwards' measure in a 3D model. Comparison of the results with the irreversible compaction data shows very good agreement. The present framework immediately suggests new experimental checks.
Collaborative tagging has been quickly gaining ground because of its ability to recruit the activity of web users into effectively organizing and sharing vast amounts of information. Here we collect data from a popular system and investigate the statistical properties of tag cooccurrence. We introduce a stochastic model of user behavior embodying two main aspects of collaborative tagging: (i) a frequency-bias mechanism related to the idea that users are exposed to each other's tagging activity; (ii) a notion of memory, or aging of resources, in the form of a heavy-tailed access to the past state of the system. Remarkably, our simple modeling is able to account quantitatively for the observed experimental features with a surprisingly high accuracy. This points in the direction of a universal behavior of users who, despite the complexity of their own cognitive processes and the uncoordinated and selfish nature of their tagging activity, appear to follow simple activity patterns.online social communities ͉ statistical physics ͉ social bookmarking ͉ information dynamics R ecently, a new paradigm has been quickly gaining ground on the World Wide Web: collaborative tagging (1-3). In web applications like Del.icio.us (http://del.icio.us), Flickr (www. f lickr.com), CiteULike (www.citeulike.org), and Connotea (www.connotea.org), users manage, share, and browse collections of online resources by enriching them with semantically meaningful information in the form of freely chosen text labels (tags). The paradigm of collaborative tagging has been successfully deployed in web applications designed to organize and share diverse online resources such as bookmarks, digital photographs, academic papers, music, and more. Web users interact with a collaborative tagging system by posting content (resources) into the system, and associating text strings (tags) with that content, as shown in Fig. 1. At the global level, the set of tags, although determined with no explicit coordination, evolves in time and leads toward patterns of terminology usage that are shared by the entire user community. Hence, one observes the emergence of a loose categorization system that can be effectively used to navigate through a large and heterogeneous body of resources.Focusing on tags as basic dynamical entities, the process of collaborative tagging falls within the scope of semiotic dynamics (4-6), a new field that studies how populations of humans or agents can establish and share semiotic systems, typically driven by their use in communication. Indeed, the emergence of a folksonomy exhibits dynamical aspects also observed in human languages (7,8), such as the crystallization of naming conventions, competition between terms, takeovers by neologisms, and more.In the following, we adopt the point of view of complex systems science and try to understand how the ''microscopic'' tagging activity of users causes the emergence of the high-level features we observe for the ensuing folksonomy. We ground our analysis on actual tagging data extracted from Del...
Novelties are a familiar part of daily life. They are also fundamental to the evolution of biological systems, human society, and technology. By opening new possibilities, one novelty can pave the way for others in a process that Kauffman has called “expanding the adjacent possible”. The dynamics of correlated novelties, however, have yet to be quantified empirically or modeled mathematically. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya's urn, predicts statistical laws for the rate at which novelties happen (Heaps' law) and for the probability distribution on the space explored (Zipf's law), as well as signatures of the process by which one novelty sets the stage for another. We test these predictions on four data sets of human activity: the edit events of Wikipedia pages, the emergence of tags in annotation systems, the sequence of words in texts, and listening to new songs in online music catalogues. By quantifying the dynamics of correlated novelties, our results provide a starting point for a deeper understanding of the adjacent possible and its role in biological, cultural, and technological evolution.
In this letter we present a very general method to extract information from a generic string of characters, e.g. a text, a DNA sequence or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution and language classification. (PACS: 89.70.+c,05.) Many systems and phenomena in nature are often represented in terms of sequences or strings of characters. In experimental investigations of physical processes, for instance, one typically has access to the system only through a measuring device which produces a time record of a certain observable, i.e. a sequence of data. On the other hand other systems are intrinsically described by string of characters, e.g. DNA and protein sequences, language.When analyzing a string of characters the main question is to extract the information it brings. For a DNA sequence this would correspond to the identification of the sub-sequences codifying the genes and their specific functions. On the other hand for a written text one is interested in understanding it, i.e. recognize the language in which the text is written, its author, the subject treated and eventually the historical background.The problem cast in such a way, one would be tempted to approach it from a very interesting point of view: that of information theory [1,2]. In this context the word information acquires a very precise meaning, namely that of the entropy of the string, a measure of the surprise the source emitting the sequences can reserve to us.As it is evident the word information is used with different meanings in different contexts. Suppose now for a while to be able to measure the entropy of a given sequence (e.g. a text). Is it possible to obtain from this measure the information (in the semantic sense) we were trying to extract from the sequence? This is the question we address in this paper.In particular we define in a very general way a concept of remoteness (or similarity) between pairs of sequences based on their relative informatic content. We devise, without loss of generality with respect to the nature of the strings of characters, a method to measure this distance based on data-compression techniques. The specific question we address is whether this informatic distance between pairs of sequences is representative of the real semantic difference between the sequences. It turns out that the answer is yes, at least in the framework of the examples on which we have implemented the method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.