Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science 2013
DOI: 10.1145/2534248.2534261
|View full text |Cite
|
Sign up to set email alerts
|

Scalable script-based data analysis workflows on clouds

Abstract: Data analysis workflows are often composed by many concurrent and compute-intensive tasks that can be efficiently executed only on scalable computing infrastructures, such as HPC systems, Grids and Cloud platforms. The use of Cloud services for the scalable execution of data analysis workflows is the key feature of the Data Mining Cloud Framework (DMCF), which provides a Web interface to develop data analysis applications using a visual workflow formalism. In this paper we describe how we extended DMCF to supp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
1

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 19 publications
(23 reference statements)
0
6
0
1
Order By: Relevance
“…JavaScript for Cloud is a JavaScript-based language for programming data analysis workflows. It has been introduced as the script-based language for the DMCF [21]. The web interface of DMCF allows to design and execute workflows programmed by the JS4Cloud language, by providing an environment similar to that used to develop visual workflows in the same framework.…”
Section: The Javascript For Cloud Languagementioning
confidence: 99%
See 1 more Smart Citation
“…JavaScript for Cloud is a JavaScript-based language for programming data analysis workflows. It has been introduced as the script-based language for the DMCF [21]. The web interface of DMCF allows to design and execute workflows programmed by the JS4Cloud language, by providing an environment similar to that used to develop visual workflows in the same framework.…”
Section: The Javascript For Cloud Languagementioning
confidence: 99%
“…Then, k unlabeled datasets are specified as input, with k D 4 (line 17). Each of the k input datasets is classified by n predictors using the n models generated by J48 and by m predictors using the m models generated by JRip; therefore, for each of the k input datasets, n C m classified datasets are generated (lines [18][19][20][21][22]. As a final step, k weighted voters are executed; the i-th voter receives the n C m classified datasets generated from the i-th input and the n C m models and returns the final classified dataset for the i-th input (lines [25][26].…”
Section: Ensemble Learning Workflowmentioning
confidence: 99%
“…La salida de la validación del modelo fue la siguiente, como se observa en la tabla 8, con una primera columna que es la instancia; la segunda no se tiene en cuenta porque todos los atributos fueron marcados con un "? ", por ende la columna actual puede ser ignorada, se limita a establecer que cada clase pertenece a una clase desconocida; la columna predicted muestra la predicción de cada instancia, y la columna error prediction refleja la probabilidad de que la instancia en realidad pertenezca a la clase (Marozzo, Talia, & Trunfio, 2013).…”
Section: Validación Del Modelounclassified
“…Several workflows commonly require the execution of multiple data-intensive operations as loading, transformation, and aggregation ( Mattoso et al, 2010 ). Multiple computational paradigms can be used for the design and execution of workflows, e.g., shell and Python scripts ( Marozzo, Talia & Trunfio, 2013 ), Big Data frameworks (e.g., Hadoop and Spark) ( Guedes et al, 2020b ), but they are usually managed by complex engines named Workflow Management Systems (WfMS). A key feature that a WfMS must address is the efficient and automatic management of parallel processing activities in High Performance Computing (HPC) environments ( Ogasawara et al, 2011 ).…”
Section: Introductionmentioning
confidence: 99%