Online socio-technical systems can be studied as proxy of the real world to investigate human behavior and social interactions at scale. Here we focus on Instagram, a mediasharing online platform whose popularity has been rising up to gathering hundred millions users. Instagram exhibits a mixture of features including social structure, social tagging and media sharing. The network of social interactions among users models various dynamics including follower/followee relations and users' communication by means of posts/comments. Users can upload and tag media such as photos and pictures, and they can "like" and comment each piece of information on the platform. In this work we investigate three major aspects on our Instagram dataset: (i) the structural characteristics of its network of heterogeneous interactions, to unveil the emergence of self organization and topically-induced community structure; (ii) the dynamics of content production and consumption, to understand how global trends and popular users emerge; (iii) the behavior of users labeling media with tags, to determine how they devote their attention and to explore the variety of their topical interests. Our analysis provides clues to understand human behavior dynamics on socio-technical systems, specifically users and content popularity, the mechanisms of users' interactions in online environments and how collective trends emerge from individuals' topical interests.
Respiratory diseases are among the most common causes of severe illness and death worldwide. Prevention and early diagnosis are essential to limit or even reverse the trend that characterizes the diffusion of such diseases. In this regard, the development of advanced computational tools for the analysis of respiratory auscultation sounds can become a game changer for detecting disease-related anomalies, or diseases themselves. In this work, we propose a novel learning framework for respiratory auscultation sound data. Our approach combines state-of-the-art feature extraction techniques and advanced deep-neural-network architectures. Remarkably, to the best of our knowledge, we are the first to model a recurrent-neural-network based learning framework to support the clinician in detecting respiratory diseases, at either level of abnormal sounds or pathology classes. Results obtained on the ICBHI benchmark dataset show that our approach outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks, thus advancing the state-of-the-art in respiratory disease analysis. arXiv:1907.05708v1 [eess.AS]
The massive presence of silent members in online communities, the so-called lurkers, has long attracted the attention of researchers in social science, cognitive psychology, and computer-human interaction. However, the study of lurking phenomena represents an unexplored opportunity of research in data mining, information retrieval and related fields. In this paper, we take a first step towards the formal specification and analysis of lurking in social networks. We address the new problem of lurker ranking and propose the first centrality methods specifically conceived for ranking lurkers in social networks. Our approach utilizes only the network topology without probing into text contents or user relationships related to media. Using Twitter, Flickr, FriendFeed and GooglePlus as cases in point, our methods' performance was evaluated against datadriven rankings as well as existing centrality methods, including the classic PageRank and alpha-centrality. Empirical evidence has shown the significance of our lurker ranking approach, and its uniqueness in effectively identifying and ranking lurkers in an online social network.
Dealing with structure and content semantics underlying semistructured documents is challenging for any task of document management and knowledge discovery conceived for such data. In this work we address the novel problem of clustering semantically related XML documents according to their structure and content features. XML features are generated by enriching syntactic with semantic information based on a lexical knowledge base. The backbone of the proposed framework for the semantic clustering of XML documents is a data representation model that exploits the notion of tree tuple to identify semantically cohesive substructures in XML documents and represent them as transactional data. This framework is equipped with two clustering algorithms based on different paradigms, namely centroid-based partitional clustering and frequent-itemset-based hierarchical clustering. An extensive experimental evaluation was conducted on real data sets from various domains, showing the significance of our approach as a solution for the semantic clustering of XML documents.
Abstract. We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.