Sridhar Rajagopalan scite author profile

Abstract. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -mathematical, sociological, and commercial -for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. OverviewFew events in the history of computing have wrought as profound an influence on society as the advent and growth of the World-Wide Web. For the first time, millions -soon to be billions -of individuals are creating, annotating and exploiting hyperlinked content in a distributed fashion. A particular Web page might be authored in any language, dialect, or style by an individual with any background, culture, motivation, interest, and education; might range from a few characters to a few hundred thousand; might contain truth, falsehood, lies, propaganda, wisdom, or sheer nonsense; and might point to none, few, or several other Web pages. The hyperlinks of the Web endow it with additional structure; and the network of these links is rich in latent information content. Our focus in this paper is on the directed graph induced by the hyperlinks between Web pages; we refer to this as the Web graph.For our purposes, nodes represent static html pages and hyperlinks represent directed edges. Recent estimates [4] suggest that there are several hundred million nodes in the Web graph; this quantity is growing by a few percent a month. The average node has roughly seven hyperlinks (directed edges) to other pages, making for a total of several billion hyperlinks in all.

show abstract

Stochastic models for the Web graph

Kumar¹,

et al.

View full text Add to dashboard Cite

Automatic resource compilation by analyzing hyperlink structure and associated text

Chakrabarti

Dom

Raghavan

et al. 1998

Computer Networks and ISDN Systems

452

325

View full text Add to dashboard Cite

show abstract

Computing on data streams

Henzinger¹,

Raghavan²,

Rajagopalan³

1999

275

257

View full text Add to dashboard Cite

show abstract

Graph structure in the web

Bröder¹,

Kumar²,

Maghoul³

et al. 2011

151

211

View full text Add to dashboard Cite

Mining the Web's link structure

et al. 1999

View full text Add to dashboard Cite

Search obstacles As we consider the types of pages we hope to discover, and to do so automatically, we quickly confront some difficult problems. First, it is insufficient to apply purely text-based methods to collect many potentially Sifting through the growing mountain of Web data demands an increasingly discerning search engine, one that can reliably assess the quality of sites, not just their relevance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.