2013
DOI: 10.1504/ijcc.2013.055265
|View full text |Cite
|
Sign up to set email alerts
|

Scalable data management for map-reduce-based data-intensive applications: a view for cloud and hybrid infrastructures

Abstract: As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
2
1

Relationship

4
3

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 29 publications
0
14
0
Order By: Relevance
“…Another question to consider is that several biological databases are dispersed across different institutions like Gene Report , Ensembl , and others. The solutions proposed for the hybrid infrastructure consider this heterogeneous scenario and are based on the scope of the MR ANR project , in the context of biochemical research to produce medicines.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another question to consider is that several biological databases are dispersed across different institutions like Gene Report , Ensembl , and others. The solutions proposed for the hybrid infrastructure consider this heterogeneous scenario and are based on the scope of the MR ANR project , in the context of biochemical research to produce medicines.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Another question to consider is that several biological databases are dispersed across different institutions like Gene Report [19], Ensembl [20], and others. The solutions proposed for the hybrid infrastructure consider this heterogeneous scenario and are based on the scope of the MR ANR project § [14], in the context of biochemical research to produce medicines.Some researchers [21-23] have put forward Hadoop implementations based on a geo-distributed dataset in multiple data centers. The authors state that, for instance, it is possible to have multiple execution paths for carrying out an MR job in this scenario, and the performance can carry out a great deal.…”
mentioning
confidence: 99%
“…Concerning the implementation of Active Data, we plan to investigate rollback mechanisms for fault-tolerant execution of applications and evaluate distributed implementations of the publish/subscribe substrate. Finally, several application prototypes are being developed using Active Data: a MapReduce runtime which mixes low power mobile devices (tablets, set-top boxes, smartphones) and online Cloud storage [31], a distributed and cooperative content delivery network to distribute virtual appliance images embedding large HEP applications to Internet Desktop Grid resources [53] and a distributed network of checkpoint image server featuring server selection using network distance [54].…”
Section: Resultsmentioning
confidence: 99%
“…For instance, [28][29][30] investigate several options to efficiently deliver data to Cloud, Grids and Desktop Grids infrastructures using P2P systems. More recently, several works have investigated the possibility to execute MapReduce applications on Clouds and Desktop Clouds [31,32]. Our preliminary work around Active Data [33,34] have explored issue of representing data sets when these are distributed on hybrid infrastructures.…”
Section: Related Workmentioning
confidence: 99%
“…We analyze the characteristics of hybrid MR runtime environment and design the simulator accordingly. The simulator is based on SimGrid [5] and leverages solutions proposed in the scope of the MapReduce ARN project [6].…”
Section: Introductionmentioning
confidence: 99%