Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. REPORT DATE FEB 20092. REPORT TYPE SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) AOARD, UNIT 45002, APO, AP, 96337-5002 SPONSOR/MONITOR'S ACRONYM(S) AOARD SPONSOR/MONITOR'S REPORT NUMBER(S) AOARD-084054 DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited SUPPLEMENTARY NOTES ABSTRACTWeb monitoring systems report any changes on the target web pages by revisiting them frequently. As they are operated under significant constrains such as network and computing, it is necessary to minimize revisits with minimal delay and maximum coverage. Various statistical scheduling methods were proposed to resolve this problem. However they are static and cannot easily cope with events in the real world. This paper proposes a new scheduling method that manages unpredictable events. MCRDR (Multiple Classification Ripple-Down Rules) document classification knowledge base was reused to detect events and to initiate a prompt web monitoring process regardless of static monitoring schedule. The experiment demonstrates that the approach proposed improves monitoring efficiency significantly. SUBJECT TERMS SECURITY CLASSIFICATION OF:17. LIMITATION OF ABSTRACT Same as Report (SAR)18 ObjectivesOne of the main aims of web monitoring system is to collect information from the selected web pages with maximum coverage and minimal delay. To this end, a web monitoring system needs to revisit its monitoring web pages according to its revisit schedule for each monitoring web page. If there are no resource constraints, web monitoring system may obtain high coverage with low delay by as frequent revisits for target web pages as possible. However, revisiting process, technically sending and receiving HTTP messages via internet, is very expensive and is constrained by resource limits, including network and computer capacities. Therefore, it is essential for the Web monitoring system to have an efficient scheduling method that minimizes its revisit frequency while maximizing its coverage and timeliness. Statistical scheduling approaches have provided static solutions for this problem based on the assumption that there exist stable publication patterns. However, real-world web publications are affected ...
In this paper we analyze the Web coverage of three search engines, Google, Yahoo and MSN. We conducted a 15 month study collecting 15,770 Web content or information pages linked from 260 Australian federal and local government Web pages. The key feature of this domain is that new information pages are constantly added but the 260 web pages tend to provide links only to the more recently added information pages. Search engines list only some of the information pages and their coverage varies from month to month. Meta-search engines do little to improve coverage of information pages, because the problem is not the size of web coverage, but the frequency with which information is updated. We conclude that organizations such as governments which post important information on the Web cannot rely on all relevant pages being found with conventional search engines, and need to consider other strategies to ensure important information can be found.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.