The unabated growth and increasing significance of the World Wide Web has resulted in a flurry of research activity to improve its capacity for serving information more effectively. But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" of Web resources and services. This observation points towards measurements and models that quantify various attributes of web sites. The science of measuring all aspects of information, especially its storage and retrieval or informetrics has interested information scientists for decades before the existence of the Web. Is Web informetrics any different, or is it just an application of classical informetrics to a new medium? In this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties , Web page significance , Web page similarity , search and retrieval , usage characterization and information theoretic properties . We also discuss how these metrics can be applied for improving Web information access and use.
Influence maximization (im) is the problem of finding a small subset of nodes (seed nodes) in a social network that could maximize the spread of influence. Despite the progress achieved by state-of-the-art greedy im techniques, they suffer from two key limitations. Firstly, they are inefficient as they can take days to find seeds in very large realworld networks. Secondly, although extensive research in social psychology suggests that humans will readily conform to the wishes or beliefs of others, surprisingly, existing im techniques are conformity-unaware. That is, they only utilize an individual's ability to influence another but ignores conformity (a person's inclination to be influenced) of the individuals. In this paper, we propose a novel conformityaware cascade (c 2 ) model which leverages on the interplay between influence and conformity in obtaining the influence probabilities of nodes from underlying data for estimating influence spreads. We also propose a variant of this model Electronic supplementary material The online version of this article (called c 3 model that supports context-specific influence and conformity of nodes. A salient feature of these models is that they are aligned to the popular social forces principle in social psychology. Based on these models, we propose a novel greedy algorithm called cinema that generates highquality seed set for the im problem. It first partitions, the network into a set of non-overlapping subnetworks and for each of these subnetworks it computes the influence and conformity indices of nodes by analyzing the sentiments expressed by individuals. Each subnetwork is then associated with a cog-sublist which stores the marginal gains of the nodes in the subnetwork in descending order. The node with maximum marginal gain in each cog-sublist is stored in a data structure called mag-list. These structures are manipulated by cinema to efficiently find the seed set. A key feature of such partitioning-based strategy is that each node's influence computation and updates can be limited to the subnetwork it resides instead of the entire network. This paves way for seamless adoption of cinema on a distributed platform. Our empirical study with real-world social networks comprising of millions of nodes demonstrates that cinema as well as its context-aware and distributed variants generate superior quality seed set compared to state-of-the-art im approaches.
Tags associated with social images are valuable information source for superior image search and retrieval experiences. Although various heuristics are valuable to boost tag-based search for images, there is a lack of general framework to study the impact of these heuristics. Specifically, the task of ranking images matching a given tag query based on their associated tags in descending order of relevance has not been well studied. In this article, we take the first step to propose a generic, flexible, and extensible framework for this task and exploit it for a systematic and comprehensive empirical evaluation of various methods for ranking images. To this end, we identified five orthogonal dimensions to quantify the matching score between a tagged image and a tag query. These five dimensions are: (i) tag relatedness to measure the degree of effectiveness of a tag describing the tagged image; (ii) tag discrimination to quantify the degree of discrimination of a tag with respect to the entire tagged image collection; (iii) tag length normalization analogous to document length normalization in web search; (iv) tag-query matching model for the matching score computation between an image tag and a query tag; and (v) query model for tag query rewriting. For each dimension, we identify a few implementations and evaluate their impact on NUS-WIDE dataset, the largest humanannotated dataset consisting of more than 269K tagged images from Flickr. We evaluated 81 single-tag queries and 443 multi-tag queries over 288 search methods and systematically compare their performances using standard metrics including Precision at top-K, Mean Average Precision (MAP), Recall, and Normalized Discounted Cumulative Gain (NDCG).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.