SpeCH: A scalable framework for data placement of data-intensive services in geo-distributed clouds

Atrey, Ankita; Seghbroeck, Gregory Van; Mora, Higinio; Turck, Filip De; Volckaert, Bruno

doi:10.1016/j.jnca.2019.05.012

Cited by 21 publications

(13 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More specifically, any change (small or large) in the system workload would require re-execution of the full pipeline to obtain the placement output. This design decision is in line with almost every existent technique [3]- [5], [16], [17], [43], [49] in the extensive literature on data placement. Thus, making the CDR placement algorithm dynamically adapt to the changes in the system workload is not in the scope of the current work.…”

Section: Combined Data and Replica Placementsupporting

confidence: 68%

“…On the other hand, publicly available specialized heuristics for hypergraph partitioning [7] enable graceful scaling of the aforementioned methods to large datasets. Moving further, Atrey et al [3], [5] proposed an algorithm based on spectral clustering of hypergraphs, which portrayed quality similar to the algorithms proposed in [43], however, achieved superior efficiency and scalability owing to the use of randomized eigendecomposition techniques for factorizing the hypergraph laplacian.…”

Section: Related Workmentioning

confidence: 98%

“…Although advancements in enabling technologies such as big data and cloud computing have provided us with the necessary machinery and systems (e.g., Apache Hadoop [39] and Spark [46]) to perform data management at scale, effective strategies for data placement and partitioning remain crucial for ensuring the performance of such systems [15]. Having said that, the field of data placement has witnessed a humongous amount of research over the past two decades [3], [4], [17], [30], [36], [41], [43], [45], [48], [49].…”

Section: Motivationmentioning

confidence: 99%

“…Therefore, both data and replica placement should be considered as objectives of a single joint optimization problem. Having said that, although the field of data placement has witnessed significant advancements [3], [17], [43], to the best of our knowledge, none of the existing techniques possess the capability of performing data and replica placement jointly. On the contrary, most of the existing techniques treat the two placement steps as independent problems, and perform data placement followed by replica placement (Fig.…”

Section: Motivationmentioning

confidence: 99%

“…Owing to their ability to capture both data-item -node and data-item -data-item associations, the methods proposed by Atrey et al [3] and Yu et al [43] (Hyper)graph based solutions have also been popular for data placement of more traditional workloads such as scientific and relational workflows. The existence of a polynomialtime reduction of the data placement problem into an instance of the graph partitioning problem was proved in [17].…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

UnifyDR: A Generic Framework for Unifying Data and Replica Placement

et al. 2020

Self Cite

View full text Add to dashboard Cite

The advent of (big) data management applications operating at Cloud scale has led to extensive research on the data placement problem. The key objective of data placement is to obtain a partitioning (possibly allowing for replicas) of a set of data-items into distributed nodes that minimizes the overall network communication cost. Although replication is intrinsic to data placement, it has seldom been studied in combination with the latter. On the contrary, most of the existing solutions treat them as two independent problems, and employ a two-phase approach: (1) data placement, followed by (2) replica placement. We address this by proposing a new paradigm, CDR, with the objective of combining data and replica placement as a single joint optimization problem. Specifically, we study two variants of the CDR problem: (1) CDR-Single, where the objective is to minimize the communication cost alone, and (2) CDR-Multi, which performs a multi-objective optimization to also minimize traffic and storage costs. To unify data and replica placement, we propose a generic framework called UnifyDR, which leverages overlapping correlation clustering to assign a data-item to multiple nodes, thereby facilitating data and replica placement to be performed jointly. We establish the generic nature of UnifyDR by portraying its ability to address the CDR problem in two real-world use-cases, that of join-intensive online analytical processing (OLAP) queries and a location-based online social network (OSN) service. The effectiveness and scalability of UnifyDR are showcased by experiments performed on data generated using the TPC-DS benchmark and a trace of the Gowalla OSN for the OLAP queries and OSN service use-case, respectively. Empirically, the presented approach obtains an improvement of approximately 35% in terms of the evaluated metrics and a speed-up of 8 times in comparison to state-of-the-art techniques.

show abstract

Section: Combined Data and Replica Placementsupporting

confidence: 68%

Section: Related Workmentioning

confidence: 98%

Section: Motivationmentioning

confidence: 99%

Section: Motivationmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

UnifyDR: A Generic Framework for Unifying Data and Replica Placement

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

Revenue maximization approaches in IaaS clouds: Research challenges and opportunities

Badshah

Ghani

Daud

et al. 2022

Trans Emerging Tel Tech

View full text Add to dashboard Cite

This study critically reviews the IaaS clouds development on revenue maximization since 2012 to answer these research queries; (i) What are the main influential factors towards revenue maximization in the cloud market? (ii) What are the main challenges and resistance towards revenue maximization in cloud computing? and (iii) What are the possible solutions and potentials to these hurdles in cloud computing? The data was analyzed and the influencing factors of revenue maximization were classified into seven distinct categories, that is, the performance of the services, service level agreement and penalties management, resources scalability, resources utilization and scheduling, customers' satisfaction, cost, and pricing management, as well as advertisement and auction. These parameters are investigated in detail and new dynamics for researchers in the field of the cloud are discovered. These studies are compared against each other for the seven distinct categories and solutions are proposed for the clouds' obstacles to revenue maximization. Furthermore, in the light of the findings and revenue maximization categories, the main limitations, challenges, true potential, and new directions towards revenue maximization are explored.

show abstract

A Middleware-Based Approach for Latency-Sensitive Service Provisioning in IoT with End-Edge Cooperation

Sun

et al. 2023

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

View full text Add to dashboard Cite

SpeCH: A scalable framework for data placement of data-intensive services in geo-distributed clouds

Cited by 21 publications

References 29 publications

UnifyDR: A Generic Framework for Unifying Data and Replica Placement

UnifyDR: A Generic Framework for Unifying Data and Replica Placement

Revenue maximization approaches in IaaS clouds: Research challenges and opportunities

A Middleware-Based Approach for Latency-Sensitive Service Provisioning in IoT with End-Edge Cooperation

Contact Info

Product

Resources

About