Mihai Preda scite author profile

The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.

show abstract

Adaptive on-line page importance computation

Abiteboul¹,

Preda²,

Cobena³

2003

View full text Add to dashboard Cite

Monitoring XML data on the Web

Nguyen

Abiteboul

Cobena

et al. 2001

View full text Add to dashboard Cite

Xyleme, a dynamic warehouse for XML data of the Web

Abiteboul

Aguiléra

Ailleret

et al.

View full text Add to dashboard Cite

Monitoring XML data on the Web

et al. 2001

View full text Add to dashboard Cite

We consider the monitoring of a flow of incoming documents. More precisely, we present here the monitoring used in a very large warehouse built from XML documents found on the web. The flow of documents consists in XML pages (that are warehoused) and HTML pages (that are not). Our contributions are the following: a subscription language which specifies the monitoring of pages when fetched, the periodical evaluation of continuous queries and the production of XML reports. the description of the architecture of the system we implemented that makes it possible to monitor a flow of millions of pages per day with millions of subscriptions on a single PC, and scales up by using more machines. a new algorithm for processing alerts that can be used in a wider context. We support monitoring at the page level (e.g., discovery of a new page within a certain semantic domain) as well as at the element level (e.g., insertion of a new electronic product in a catalog). This work is part of the Xyleme system. Xyleme is developed on a cluster of PCs under Linux with Corba communications. The part of the system described in this paper has been implemented. We mention first experiments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mihai Preda

Adaptive on-line page importance computation

Adaptive on-line page importance computation

Monitoring XML data on the Web

Xyleme, a dynamic warehouse for XML data of the Web

Monitoring XML data on the Web

Contact Info

Product

Resources

About