<span lang="EN-US">Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014.</span>
No abstract
Abstract-this paper deals with a very renowned website (that is Book-Crossing) from two angles: The first angle focuses on the direct relations between users and books. Many things can be inferred from this part of analysis such as who is more interested in book reading than others and why? Which books are most popular and which users are most active and why? The task requires the use of certain social network analysis measures (e.g. degree centrality). What does it mean when two users like the same book? Is it the same when other two users have one thousand books in common? Who is more likely to be a friend of whom and why? Are there specific people in the community who are more qualified to establish large circles of social relations? These questions (and of course others) were answered through the other part of the analysis, which will take us to probe the potential social relations between users in this community. Although these relationships do not exist explicitly, they can be inferred with the help of affiliation network analysis and techniques such as m-slice. Book-Crossing dataset, which covered four weeks of users' activities during 2004, has always been the focus of investigation for researchers interested in discovering patterns of users' preferences in order to offer the most possible accurate recommendations. However; the implicit social relationships among users that emerge (when putting users in groups based on similarity in book preferences) did not gain the same amount of attention. This could be due to the importance recommender systems attain these days (as compared to other research fields) as a result to the rapid spread of e-commerce websites that seek to market their products online. Certain social network analysis software, namely Pajek, was used to explore different structural aspects of this community such as brokerage roles, triadic constraints and levels of cohesion. Some overall statistics were also obtained such as network density, average geodesic distance and average degree.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.