Internet social networks have become a ubiquitous application allowing people to easily share text, pictures, and audio and video files. Popular networks include WhatsApp, Facebook, Reddit and LinkedIn. We present an extensive study of the usage of the WhatsApp social network, an Internet messaging application that is quickly replacing SMS messaging. In order to better understand people's use of the network, we provide an analysis of over 6 million messages from over 100 users, with the objective of building demographic prediction models using activity data. We performed extensive statistical and numerical analysis of the data and found significant differences in WhatsApp usage across people of different genders and ages. We also inputted the data into the Weka data mining package and studied models created from decision tree and Bayesian network algorithms. We found that different genders and age demographics had significantly different usage habits in almost all message and group attributes. We also noted differences in users' group behavior and created prediction models, including the likelihood a given group would have relatively more file attachments, if a group would contain a larger number of participants, a higher frequency of activity, quicker response times and shorter messages. We were successful in quantifying and predicting a user's gender and age demographic. Similarly, we were able to predict different types of group usage. All models were built without analyzing message content. We present a detailed discussion about the specific attributes that were contained in all predictive models and suggest possible applications based on these results. *
In recent years, social networks have surged in popularity. One key aspect of social network research is identifying important missing information which is not explicitly represented in the network, or is not visible to all. To date, this line of research typically focused on finding the connections that are missing between nodes, a challenge typically termed as the Link Prediction Problem. This paper introduces the Missing Node Identification problem where missing members in the social network structure must be identified. In this problem, indications of missing nodes are assumed to exist. Given these indications and a partial network, we must assess which indications originate from the same missing node and determine the full network structure.Towards solving this problem, we present the MISC Algorithm (Missing node Identification by Spectral Clustering), an approach based on a spectral clustering algorithm, combined with nodes' pairwise affinity measures which were adopted from link prediction research. We evaluate the performance of our approach in different problem settings and scenarios, using real life data from Facebook. The results show that our approach has beneficial results and can be effective in solving the Missing Node Identification Problem. In addition, this paper also presents R-MISC which uses a sparse matrix representation, efficient algorithms for calculating the nodes' pairwise affinity and a proprietary dimension reduction technique, to enable scaling the MISC algorithm to large networks of more than 100,000 nodes. Last, we consider problem settings where some of the indications are unknown. Two algorithms are suggested for this problem -Speculative MISC, based on MISC, and Missing Link Completion, based on classical link prediction literature. We show that Speculative MISC outperforms Missing Link Completion.
Abstract-An important area of social networks research is identifying missing information which is not explicitly represented in the network, or is not visible to all. Recently, the Missing Node Identification problem was introduced where missing members in the social network structure must be identified. However, previous works did not consider the possibility that information about specific users (nodes) within the network could be useful in solving this problem. In this paper, we present two algorithms: SAMI-A and SAMI-N. Both of these algorithms use the known nodes' specific information, such as demographic information and the nodes' historical behavior in the network. We found that both SAMI-A and SAMI-N perform significantly better than other missing node algorithms. However, as each of these algorithms and the parameters within these algorithms often perform better in specific problem instances, a mechanism is needed to select the best algorithm and the best variation within that algorithm. Towards this challenge, we also present OASCA, a novel online selection algorithm. We present results that detail the success of the algorithms presented within this paper.
An important area of social network research is identifying missing information which is not visible or explicitly represented in the network. Recently, the Missing Node Identification problem was introduced where missing members in the social network structure must be identified. However, previous works did not consider the possibility that information about specific users (nodes) within the network may be known and could be useful in solving this problem. Assuming such information such as user demographic information and users' historical behavior in the network is known, more effective algorithms for the Missing Node Identification problem could potentially be developed. In this paper, we present three algorithms, SAMI-A, SAMI-C and SAMI-N, which leverage this type of information in order to perform significantly better than previous missing node algorithms. However, as each of these algorithms and the parameters within these algorithms often perform better in specific problem instances, a mechanism is needed to select the best algorithm and the best variation within that algorithm. Towards this challenge, we also present OASCA, a novel online selection algorithm. We present results that detail the success of the algorithms presented within this paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.