VLDB '02: Proceedings of the 28th International Conference on Very Large Databases 2002
DOI: 10.1016/b978-155860869-6/50052-4
|View full text |Cite
|
Sign up to set email alerts
|

Effective Change Detection Using Sampling

Abstract: For a large-scale data-intensive environment, such as the World-Wide Web or data warehousing, we often make local copies of remote data sources. Due to limited network and computational resources, however, it is often difficult to monitor the sources constantly to check for changes and to download changed data items to the copies. In this scenario, our goal is to detect as many changes as we can using the fixed download resources that we have. In this paper we propose three sampling-based download policies tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
73
0
1

Year Published

2004
2004
2013
2013

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 66 publications
(74 citation statements)
references
References 16 publications
0
73
0
1
Order By: Relevance
“…Several works have developed techniques to crawl the Web efficiently, and to detect and model changes at pages so that the crawler can focus on pages that change frequently [20]. For domain-specific search engines, focused crawling to retrieve Web pages in the domain has been studied [12,71,91], as we mentioned in Section 2.1.…”
Section: Querying With Information Processing Systemsmentioning
confidence: 99%
“…Several works have developed techniques to crawl the Web efficiently, and to detect and model changes at pages so that the crawler can focus on pages that change frequently [20]. For domain-specific search engines, focused crawling to retrieve Web pages in the domain has been studied [12,71,91], as we mentioned in Section 2.1.…”
Section: Querying With Information Processing Systemsmentioning
confidence: 99%
“…The concept of continual queries was conceived to alleviate the impact of changing web content on information retrieval (e.g., [15,7]). Continual query systems guarantee the freshness of web pages by continuously monitoring them with appropriate frequencies.…”
Section: Related Researchmentioning
confidence: 99%
“…The problem of managing the freshness of local copies has been explored in the web caching area [19]. Here, however, the model of synchronization is 6 Euclidean Distance between two elements is…”
Section: Related Workmentioning
confidence: 99%
“…We assume that it is possible to obtain the number of updates to an element over some time period. Prior work has shown how the source can use estimation [4] and sampling [6] techniques to obtain a good estimate of these update frequencies. These frequency estimates would be periodically communicated to the mirror.…”
Section: Definitionmentioning
confidence: 99%
See 1 more Smart Citation