2015 International Conference on High Performance Computing &Amp; Simulation (HPCS) 2015
DOI: 10.1109/hpcsim.2015.7237088
|View full text |Cite
|
Sign up to set email alerts
|

A workflow-enabled big data analytics software stack for escience

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 10 publications
0
10
0
Order By: Relevance
“…First, a self join is applied on the row delta tables based on the id column of both "DDB1" and "DDB2" tables. Note that the id column should be provided by the user due to the lack of primary keys when (9) Total number of differences in data (10) Maximum difference in data (11) Minimum difference in data (12) First quartile (25th percentile) (13) Second quartile (50th percentile) − median (14) Third quartile (75th percentile) Fig. 9 The column validation step Fig.…”
Section: Column Validationmentioning
confidence: 99%
See 2 more Smart Citations
“…First, a self join is applied on the row delta tables based on the id column of both "DDB1" and "DDB2" tables. Note that the id column should be provided by the user due to the lack of primary keys when (9) Total number of differences in data (10) Maximum difference in data (11) Minimum difference in data (12) First quartile (25th percentile) (13) Second quartile (50th percentile) − median (14) Third quartile (75th percentile) Fig. 9 The column validation step Fig.…”
Section: Column Validationmentioning
confidence: 99%
“…Column statistics (12), (13), and (14): Hive provides a function called percentile that can be used to calculate the lowest and highest quartile of a data set along with the median value "percentile (BIGINT col, array (p1 [, p2] ... ))" [34]. This function calculates the specified percentiles for a data set, which is the list of differences for each column in this case.…”
Section: Creatementioning
confidence: 99%
See 1 more Smart Citation
“…The Ophidia workflow management system [35] is a core component of the Ophidia platform. It allows coordinating and orchestrating the execution of scientific experiments composed of multiple data analytics, processing and visualization operators (e.g.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Jobs () Apache Spark [12] Runtime RDD () Fireworks [13] Distributed -() Pegasus [14] (Dist.) Jobs -TaskFarmer [15] Commands Files/Shards Tigres [16] Runtime "Inputs" -Ophidea [17] Runtime Datasets Kepler [18] Runtime () -…”
Section: B Common Hpc Workflowsmentioning
confidence: 99%