Automating SLA monitoring involves minimizing human involvement in the overall monitoring process. SLA monitoring is difficult to automate as it would need precise and unambiguous specification and a customizable engine that collects the right measurement, models the data and evaluates the SLA at certain times or when certain events happen. Also most of the SLA neglect client side measurement or restrict SLAs to measurements based only on server side. In a cross-enerprise scenario like web services it will be important to obtain measurements at multiple sites and to guarantee SLAs on them. In this article we propose an automated and distributed SLA monitoring engine.
Component-based software development approach has become a trend in integrating modern software systems. To ensure the overall reliability of an integrated software system, its software components have to meet certain reliability requirements, subject to some testing schedule and resource constraints. Efficiency improvement of the system-testing can be formulated as a combinatorial optimization problem with known cost, reliability, effort, and other attributes of the system components. This paper considers "software component testing resource allocation" for a system with single or multiple applications, each with a pre-specified reliability requirement. The relation between failure rates of components and "cost to decrease this rate" is modeled by various types of reliability-growth curves. Closed-form solutions to the problem for systems with one single application are developed, and then "how to solve the multiple application problem using nonlinear programming techniques" are described. Also examined are the interactions between the system components, and inter-component failure dependencies are included in the modeling formula. In addition to regular systems, the technique is extended to address fault-tolerant systems. A procedure for a systematic approach to the testing resource allocation problem is developed, and its application in a case study of a telecommunications software system is described. This procedure is automated in a reliability allocation tool for an easy specification of the problem and an automatic application of the technique. This methodology gives the basic approach to optimization of testing schedules, subject to reliability constraints. This adds "interesting new optimization opportunities in the software testing phase" to the existing optimization literature that is concerned with structural optimization of the software architecture. Merging these two approaches improves the reliability planning accuracy in component-based software development.
We analyse and optimise the completion time for a class of jobs whose conditional completion time is not always decreasing with the time invested in the job. For such jobs, restarts may speed up the completion. Examples of such jobs include download of web pages, randomised algorithms, distributed queries and jobs subject to network or other failures. This paper derives computationally attractive expressions for the moments of the completion time of jobs under restarts and provides algorithms that optimise the restart policy. We also identify characteristics of optimal restart times as well as of probability distributions amenable to restarts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.