According to a recent survey made by Nielsen NetRatings, searching on news articles is one of the most important activity online. Indeed, Google, Yahoo, MSN and many others have proposed commercial search engines for indexing news feeds. Despite this commercial interest, no academic research has focused on ranking a stream of news articles and a set of news sources. In this paper, we introduce this problem by proposing a ranking framework which models: (1) the process of generation of a stream of news articles, (2) the news articles clustering by topics, and (3) the evolution of news story over the time. The ranking algorithm proposed ranks news information, finding the most authoritative news sources and identifying the most interesting events in the different categories to which news article belongs. All these ranking measures take in account the time and can be obtained without a predefined sliding window of observation over the stream. The complexity of our algorithm is linear in the number of pieces of news still under consideration at the time of a new posting. This allow a continuous on-line process of ranking. Our ranking framework is validated on a collection of more than 300,000 pieces of news, produced in two months by more then 2000 news sources belonging to 13 different categories (World, U.S, Europe, Sports, Business, etc). This collection is extracted from the index of comeToMyHead, an academic news search engine available online.
Recently, the research community has devoted increased attention to reducing the computational time needed by web ranking algorithms. In particular, many techniques have been proposed to speed up the well-known PageRank algorithm used by Google. This interest is motivated by two dominant factors: (1) the web graph has huge dimensions and is subject to dramatic updates in terms of nodes and links, therefore the PageRank assignment tends to became obsolete very soon; (2) many PageRank vectors need to be computed according to different choices of the personalization vectors or when adopting strategies of collusion detection. In this paper, we show how the PageRank computation in the original random surfer model can be transformed in the problem of computing the solution of a sparse linear system. The sparsity of the obtained linear system makes it possible to exploit the effectiveness of the Markov chain index reordering to speed up the PageRank computation. In particular, we rearrange the system matrix according to several permutations, and we apply different scalar and block iterative methods to solve smaller linear systems. We tested our approaches on web graphs crawled from the net. The largest one contains about 24 millions nodes and more than 100 million links. Upon this web graph, the cost for computing the PageRank is reduced by 65% in terms of Mflops and by 92% in terms of time respect to the power method commonly used.
PageRank algorithm plays a very important role in search engine technology and consists in the computation of the eigenvector corresponding to the eigenvalue one of a matrix whose size is now in the billions. The problem incorporates a parameter that determines the difficulty of the problem. In this paper, the effectiveness of stationary and nonstationary methods are compared on some portion of real web matrices for different choices of . We see that stationary methods are very reliable and more competitive when the problem is well conditioned, that is for small values of . However, for large values of the parameter the problem becomes more difficult and methods such as preconditioned BiCGStab or restarted preconditioned GMRES become competitive with stationary methods in terms of Mflops count as well as in number of iterations necessary to reach convergence.
An integrated model for ranking scientific publications together with authors and journals recently presented in [Bini, Del Corso, Romani, ETNA 2008] is closely analyzed. The model, which relies on certain adjacency matrices H. K and F obtained from the relations of citation, authorship and publication, provides the ranking by means of the Perron vector of a stochastic matrix obtained by combining H. K and F. Some perturbation theorems concerning the Perron vector previously introduced by the authors are extended to more general cases and a counterexample to a property previously addressed by the authors is presented. The theoretical results confirm the consistency and effectiveness of our model. Some paradigmatic examples are reported together with some results obtained on a real set of data. (C) 2010 Elsevier B.V. All rights reserved
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.