Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.
Social media data have been increasingly used to study biomedical and health-related phenomena. From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance and sentiment analysis, especially for mental health. We also discuss a variety of innovative uses of social media data for health-related applications as well as important limitations of social media data access and use.
Human reproduction does not happen uniformly throughout the year and what drives human sexual cycles is a long-standing question. The literature is mixed with respect to whether biological or cultural factors best explain these cycles. The biological hypothesis proposes that human reproductive cycles are an adaptation to the seasonal (hemisphere-dependent) cycles, while the cultural hypothesis proposes that conception dates vary mostly due to cultural factors, such as holidays. However, for many countries, common records used to investigate these hypotheses are incomplete or unavailable, biasing existing analysis towards Northern Hemisphere Christian countries. Here we show that interest in sex peaks sharply online during major cultural and religious celebrations, regardless of hemisphere location. This online interest, when shifted by nine months, corresponds to documented human births, even after adjusting for numerous factors such as language and amount of free time due to holidays. We further show that mood, measured independently on Twitter, contains distinct collective emotions associated with those cultural celebrations. Our results provide converging evidence that the cyclic sexual and reproductive behavior of human populations is mostly driven by culture and that this interest in sex is associated with specific emotions, characteristic of major cultural and religious celebrations.
Groups of firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters are the subjects of case studies, systematic approaches to identify and analyze the hierarchical structure of geo-industrial clusters at the global scale are scarce. In this work, we use LinkedIn’s employment history data from more than 500 million users over 25 years to construct a labor flow network of over 4 million firms across the world, from which we reveal hierarchical structure by applying network community detection. We show that the resulting geo-industrial clusters exhibit a stronger association between the influx of educated workers and financial performance, compared to traditional aggregation units. Furthermore, our analysis of the skills of educated workers reveals richer insights into the relationship between the labor flow of educated workers and productivity growth. We argue that geo-industrial clusters defined by labor flow provide useful insights into the growth of the economy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.