Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures

Silva, Rafael Ferreira da; Glatard, Tristan; Desprez, Frédéric

doi:10.1109/ccgrid.2012.24

Cited by 10 publications

(7 citation statements)

References 28 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work presented here is a step in our attempt to control computing platforms where very little is known about applications and resources, and where situations change over time. Our works in [12,20] consider similar platform conditions but they target completely different problems, namely fault-tolerance and granularity control. We believe that results of this paper are the first ones presented to control fairness in such conditions which are often met in production platforms.…”

Section: Resultsmentioning

confidence: 99%

“…Grid conditions vary among repetitions because computing, storage and network resources are shared with other users . We use MOTEUR 0.9.21, configured to resubmit failed tasks up to 5 times, and with the task replication mechanism described in [12] activated. We use the DIRAC v6r5p1 instance provided by France-Grilles 4 , with a first-come, first-served policy imposed by submitting workflows with decreasing priority values.…”

Section: Experiments Conditionsmentioning

confidence: 99%

“…1. A new instantiation of our control loop [12] to handle unfairness, consisting of (i) an online, non-clairvoyant fairness metric, and (ii) a task prioritization algorithm.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Workflow Fairness Control on Online and Non-clairvoyant Distributed Computing Platforms

Silva

Glatard

Desprez

2013

Euro-Par 2013 Parallel Processing

Self Cite

View full text Add to dashboard Cite

Fairly allocating distributed computing resources among workflow executions is critical to multiuser platforms. However, this problem remains mostly studied in clairvoyant and offline conditions, where task durations on resources are known, or the workload and available resources do not vary along time. We consider a non-clairvoyant, online fairness problem where the platform workload, task costs and resource characteristics are unknown and not stationary. We propose a fairness control loop which assigns task priorities based on the fraction of pending work in the workflows. Workflow characteristics and performance on the target resources are estimated progressively, as information becomes available during the execution. Our method is implemented and evaluated on 4 different applications executed in production conditions on the European Grid Infrastructure. Results show that our technique reduces slowdown variability by 3 to 7 compared to first-come-first-served.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Experiments Conditionsmentioning

confidence: 99%

See 1 more Smart Citation

Workflow Fairness Control on Online and Non-clairvoyant Distributed Computing Platforms

Silva

Glatard

Desprez

2013

Euro-Par 2013 Parallel Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since the outputs of each task in a workflow become inputs to subsequent tasks, and we use input size to estimate all the target parameters, poor output data size estimates for tasks at higher levels of the workflow may lead to a chain of increasing estimation errors for tasks at subsequent levels. Therefore, in addition to the offline estimation process, we also propose an online estimation process based on the MAPE-K loop (Monitoring, Analysis, Planning, Execution, and Knowledge), where task executions are constantly monitored [41,42]. Upon task completion, estimated values for the task are updated with the real values, and, based on these values, a new prediction is generated (using the regression tree of Fig.…”

Section: Online Task Resource Consumption Prediction For Scientific Wmentioning

confidence: 99%

Online Task Resource Consumption Prediction for Scientific Workflows

Silva

Juve

Rynge

et al. 2015

Parallel Process. Lett.

View full text Add to dashboard Cite

Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile five real scientific workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption based on the size of the tasks' input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using a clustering technique. Task estimates are generated based on the ratio parameter/input data size if they are correlated, or based on the probability distribution function of the parameter. We then propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.

show abstract

“…Instead of directly user input in the system, User defines general procedures and policies that guide the self-management process. IBM defines four main self-* components [7] [41] [42] [43] [44] [45].…”

Section: Introductionmentioning

confidence: 99%

Self-Protection against Insider Threats in DBMS through Policies Implementation

Zaman¹,

Raza²,

Malik³

et al. 2017

ijacsa

View full text Add to dashboard Cite

Abstract-In today's world, information security of an organization has become a major challenge as well as a critical business issue. Managing and mitigating these internal or external security related issues, organizations hire highly knowledgeable security expert persons. Insider threats in database management system (DBMS) are inherently a very hard problem to address. Employees within the organization carry out or harm organization data in a professional manner. To protect and monitor organization information from insider user in DBMS, the organization used different techniques, but these techniques are insufficient to secure their data. We offer an autonomous approach to self-protection architecture based on policy implementation in DBMS. This research proposes an autonomic model for protection that will enforce Access Control policies, Database Auditing policies, Encryption policies, user authentication policies, and database configuration setting policies in DBMS. The purpose of these policies to restrict insider user or Database Administrator (DBA) from malicious activities to protect data.

show abstract

Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures

Cited by 10 publications

References 28 publications

Workflow Fairness Control on Online and Non-clairvoyant Distributed Computing Platforms

Workflow Fairness Control on Online and Non-clairvoyant Distributed Computing Platforms

Online Task Resource Consumption Prediction for Scientific Workflows

Self-Protection against Insider Threats in DBMS through Policies Implementation

Contact Info

Product

Resources

About