Execution time prediction for grid infrastructures based on runtime provenance data

Malik, Muhammad Junaid; Fahringer, Thomas; Prodan, Radu

doi:10.1145/2534248.2534253

Cited by 6 publications

(5 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It would be interesting to uncover hidden patterns or draw new insights by applying data mining techniques on provenance data. So far, a rich body of existing literature has focused on (i) exploring a workflow pattern that frequently appears [13][14][15], (ii) applying provenance data to scheduling or optimizing simulations and workflows that are in execution [16][17][18], and (iii) estimating when to complete a specified workflow (or simulation) [19,20]. Recently, there has been a growing need to apply a variety of data mining techniques to develop frequent pattern mining and classification to provenance data.…”

Section: Advanced Utilization Of Simulation Provenancementioning

confidence: 99%

See 1 more Smart Citation

A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization

Suh

Lee

2018

Hum. Cent. Comput. Inf. Sci.

View full text Add to dashboard Cite

Research and education through computer simulation has been actively conducted in various scientific and engineering fields including computational science engineering. Accordingly, there have been a lot of attentions paid to actively utilize provenance information regarding such computer simulations, particularly conducted on highperformance computing and storage resources. In this manuscript we provide a comprehensive survey of a wide range of existing systems to utilize provenance data produced by simulation. Specifically, we (1) categorize extant provenance research articles into several major themes along with well-motivated criteria, (2) grasp and compare primary functions/features of the existing systems in each category, and (3) then ultimately propose new research directions that have never been pioneered before. In particular, we present a taxonomy of scientific platforms regarding provenance support and holistically tabulate the major functionalities and supporting levels of the studied systems. Finally, we conclude this article with a summary of our contributions.

show abstract

Section: Advanced Utilization Of Simulation Provenancementioning

confidence: 99%

“…Malik's group [19] suggested a method of predicting the execution time of a computing job on Grid infrastructures, via machine learning methods. For model training, they utilized provenance data in association with job execution.…”

Section: Execution Performance Predictionmentioning

confidence: 99%

A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization

Suh

Lee

2018

Hum. Cent. Comput. Inf. Sci.

View full text Add to dashboard Cite

show abstract

“…The problem of learning cost estimators has been addressed in the recent past, but mainly for specific scenarios that are relevant to data analytics, namely workflow-based programming on clouds and grid [24,25]. But for instance [26] showed that runtime, especially in the case of machine learning algorithms, may depend on features that are specific to the input, and thus not easy to learn.…”

Section: Estimation Impact and Cost Of Refreshmentioning

confidence: 99%

Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study

Cała

Missier

2018

Big Data Research

View full text Add to dashboard Cite

In Data Science, knowledge generated by a resource-intensive analytics process is a valuable asset. Such value, however, tends to decay over time as a consequence of the evolution of any of the elements the process depends on: external data sources, libraries, and system dependencies. It is therefore important to be able to (i) detect changes that may partially or completely invalidate prior outcomes, (ii) determine the impact that those changes will have on those prior outcomes, ideally without having to perform expensive re-computations, and (iii) optimise the process re-execution needed to selectively refresh affected outcomes. This paper presents an extensive experimental study on how the selective re-computation problem manifests itself in a relevant analytics task for Genomics, namely variant calling and clinical interpretation, and how the problem can be addressed using a combination of approaches. Starting from this experience, we then offer a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective recomputations in reaction to a variety of changes in the data.

show abstract

“…Learning cost estimators. This problem has been addressed in the recent past, but mainly for specific scenarios that are relevant to data analytics, namely workflow-based programming on clouds and grid, [17,12]. But for instance [14] showed that runtime, especially in the case of machine learning algorithms, may depend on features that are specific to the input, and thus not easy to learn.…”

Section: Process Management Challengesmentioning

confidence: 99%

Preserving the Value of Large Scale Data Analytics over Time Through Selective Re-computation

Missier

Cała

Rathi

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A pervasive problem in Data Science is that the knowledge generated by possibly expensive analytics processes is subject to decay over time, as the data used to compute it drifts, the algorithms used in the processes are improved, and the external knowledge embodied by reference datasets used in the computation evolves. Deciding when such knowledge outcomes should be refreshed, following a sequence of data change events, requires problem-specific functions to quantify their value and its decay over time, as well as models for estimating the cost of their re-computation. What makes this problem challenging is the ambition to develop a decision support system for informing data analytics recomputation decisions over time, that is both generic and customisable. With the help of a case study from genomics, in this vision paper we offer an initial formalisation of this problem, highlight research challenges, and outline a possible approach based on the collection and analysis of metadata from a history of past computations. 1 www.ncbi.nlm.nih.gov/clinvar 2 www.ncbi.nlm.nih.gov/omim

show abstract

Execution time prediction for grid infrastructures based on runtime provenance data

Cited by 6 publications

References 21 publications

A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization

A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization

Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study

Preserving the Value of Large Scale Data Analytics over Time Through Selective Re-computation

Contact Info

Product

Resources

About