This paper presents a measurement-based dependability study using event logs collected during about 3 years from 133 Windows NT and 2K workstations and servers interconnected through a LAN. We focus on the identification of machine reboots, the classification of their causes, and the evaluation of statistics characterizing the uptimes, downtimes, and the availability of the Windows NT and 2K machines.
This paper presents a measurement-based availability assessment study using field data collected during a 4-year period from 373 SunOS/Solaris Unix workstations and servers interconnected through a local area network. We focus on the estimation of machine uptimes, downtimes and availability based on the identification of failures that caused total service loss. Data corresponds to syslogd event logs that contain a large amount of information about the normal activity of the studied systems as well as their behavior in the presence of failures. It is widely recognized that the information contained in such event logs might be incomplete or imperfect. The solution investigated in this paper to address this problem is based on the use of auxiliary sources of data obtained from wtmpx files maintained by the SunOS/Solaris Unix operating system. The results obtained suggest that the combined use of wtmpx and syslogd log files provides more complete information on the state of the target systems that is useful to provide availability estimations that better reflect reality.
International audienceThis paper presents a measurement-based availability study of networked Unix systems, based on data collected during 11 months from 298 workstations and servers interconnected through a local area computing network. The data corresponds to event logs recorded by the Unix operating system via the Syslogd daemon. Our study focuses on the identification of machine reboots and the evaluation of statistical measures characterizing: (a) the distribution of reboots (per machine, time), (b) the distribution of uptimes and downtimes associated to these reboots, (c) the availability of machines including workstations and servers, and (d) error dependencies between clients and servers
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.