All online sharing systems gather data that reflects users' collective behaviour and their shared activities. This data can be used to extract different kinds of relationships, which can be grouped into layers, and which are basic components of the multidimensional social network proposed in the paper. The layers are created on the basis of two types of relations between humans, i.e. direct and object-based ones which respectively correspond to either social or semantic links between individuals. For better understanding of the complexity of the social network structure, layers and their profiles were identified and studied on two, spanned in time, snapshots of the Flickr population. Additionally, for each layer, a separate strength measure was proposed. The experiments on the Flickr photo sharing system revealed that the relationships between users result either from semantic links between objects they operate on or from social connections of these users. Moreover, the density of the social network increases in time. The second part of the study is devoted to building a social recommender system that supports the creation of new relations between users in a multimedia sharing system. Its main goal is to generate personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer in the multidimensional social network. The conducted experiments confirmed the usefulness of the proposed model.Comment: social recommender system;Multidimensional social network (MSN);Web 2.0;multi-layered social network;multimedia sharing system (MSS);recommender system;social network analysi
Abstract:We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RAkELd than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RAkELd (α = 0.05), and at least one data-driven approach is more likely to outperform RAkELd than a priori methods in the case of RAkELd's best performance. This is the largest RAkELd evaluation published to date with 250 samplings per value for 10 values of RAkELd parameter k on 12 datasets published to date.
Information spreading in complex networks is often modeled as diffusing information with certain probability from nodes that possess it to their neighbors that do not. Information cascades are triggered when the activation of a set of initial nodes -seeds -results in diffusion to large number of nodes. Here, several novel approaches for seed initiation that replace the commonly used activation of all seeds at once with a sequence of initiation stages are introduced. Sequential strategies at later stages avoid seeding highly ranked nodes that are already activated by diffusion active between stages. The gain arises when a saved seed is allocated to a node difficult to reach via diffusion. Sequential seeding and a single stage approach are compared using various seed ranking methods and diffusion parameters on real complex networks. The experimental results indicate that, regardless of the seed ranking method used, sequential seeding strategies deliver better coverage than single stage seeding in about 90% of cases. Longer seeding sequences tend to activate more nodes but they also extend the duration of diffusion. Various variants of sequential seeding resolve the trade-off between the coverage and speed of diffusion differently.The process of making the complex decisions is difficult, so it is often worth making partial decisions and to track their consequences before proceeding further. Such strategy was proven useful in areas such as: general theory of decision making 1, 2 , financial markets 3, 4 , epidemiology 5 and marketing 6 . Here, we show that sequential, consecutive approach is also highly efficient in choosing the individuals, called seeds, that when activated will widely spread information or opinion in a social network. The current research on influence maximization and information spread in complex networks focuses mainly on single stage seed initiation. An exception is new product adaptation with early diffusion of product samples 7,8 to benefit from consumer responses and product spread. The main challenge is finding a method for selection of seeds to maximize the final spread of information within the network. If the total number of seeds to be used is limited, e.g. due to restricted budget, a typical approach is to rank all nodes in the network according to some criteria, select top n nodes as seeds and activate them at once to initiate the diffusion.Influence maximization problem in complex networks was defined by Kempe 9 . Analyses of various factors affecting the diffusion and social influence in complex networks include the efficiency of using different centrality measures for ranking influencers for selection 10 , impact of homophily for successful seeding 11 , and heterogeneous thresholds on congestion 12 , finding the critical initiator fraction beyond which the cascade becomes global 13 or importance of different network features in predicting spread 14 . Selection of initial seeds was also analyzed, including incentives for innovators to start diffusion 15 and the multi-market entry pe...
One of the most popular methods of estimating the complexity of networks is to measure the entropy of network invariants, such as adjacency matrices or degree sequences. Unfortunately, entropy and all entropy-based information-theoretic measures have several vulnerabilities. These measures neither are independent of a particular representation of the network nor can capture the properties of the generative process, which produces the network. Instead, we advocate the use of the algorithmic entropy as the basis for complexity definition for networks. Algorithmic entropy (also known as Kolmogorov complexity or -complexity for short) evaluates the complexity of the description required for a lossless recreation of the network. This measure is not affected by a particular choice of network features and it does not depend on the method of network representation. We perform experiments on Shannon entropy and -complexity for gradually evolving networks. The results of these experiments point to -complexity as the more robust and reliable measure of network complexity. The original contribution of the paper includes the introduction of several new entropy-deceiving networks and the empirical comparison of entropy and -complexity as fundamental quantities for constructing complexity measures for networks.
The self-supervised learning (SSL) paradigm is an essential exploration area, which tries to eliminate the need for expensive data labeling. Despite the great success of SSL methods in computer vision and natural language processing, most of them employ contrastive learning objectives that require negative samples, which are hard to define. This becomes even more challenging in the case of graphs and is a bottleneck for achieving robust representations. To overcome such limitations, we propose a framework for self-supervised graph representation learning -Graph Barlow Twins, which utilizes a cross-correlation-based loss function instead of negative samples. Moreover, it does not rely on non-symmetric neural network architectures -in contrast to state-of-the-art self-supervised graph representation learning method BGRL. We show that our method achieves as competitive results as BGRL, best self-supervised methods, and fully supervised ones while requiring substantially fewer hyperparameters and converging in an order of magnitude training steps earlier.Preprint. Under review.
The problem of finding optimal set of users for influencing others in the social network has been widely studied. Because it is NPhard, some heuristics were proposed to find sub-optimal solutions. Still, one of the commonly used assumption is the one that seeds are chosen on the static network, not the dynamic one. This static approach is in fact far from the real-world networks, where new nodes may appear and old ones dynamically disappear in course of time.The main purpose of this paper is to analyse how the results of one of the typical models for spread of influence -linear threshold -differ depending on the strategy of building the social network used later for choosing seeds. To show the impact of network creation strategy on the final number of influenced nodes -outcome of spread of influence, the results for three approaches were studied: one static and two temporal with different granularities, i.e. various number of time windows. Social networks for each time window encapsulated dynamic changes in the network structure. Calculation of various node structural measures like degree or betweenness respected these changes by means of forgetting mechanism -more recent data had greater influence on node measure values. These measures were, in turn, used for node ranking and their selection for seeding.All concepts were applied to experimental verification on five real datasets. The results revealed that temporal approach is always better
In our study, we examine the impact of citation network structures on the ability to discern valuable research topics in Computer Science literature. We use the bibliographic information available in the DBLP database to extract candidate phrases from scientific paper abstracts. Following that, we construct citation networks based on direct citation, co-citation and bibliographic coupling relationships between the papers. The candidate research topics, in the form of keyphrases and n-grammes, are subsequently ranked and filtered by a graph-text ranking algorithm. This selection of the highest ranked potential topics is further evaluated by domain experts and through the Wikipedia knowledge base. The results obtained from these citation networks are complementary, returning valid but non-overlapping output phrases between some pairs of networks. In particular, bibliographic coupling appears to capture more unique information than either direct citation or co-citation. These findings point towards the possible added value in combining bibliographic coupling analysis with other structures. At the same time, combining direct citation and co-citation is put into question. We expect our findings to be utilised in method design for research topic identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.