Fatimah Alsayoud scite author profile

Big data ecosystems contain a mix of sophisticated hardware storage components to support heterogeneous workloads. Storage components and the workloads interact and affect each other; therefore, their relationship has to consider when modeling workloads or managing storage. Efficient workload modeling guides optimal storage management decisions, and the right decisions help guarantee the workload’s needs. The first part of this thesis focuses on workload modeling efficiency, and the second part focuses on cost-effective storage management.<div>Workload performance modeling is an essential step in management decisions. The standard modeling approach constructs the model based on a historical dataset collected from one set of setups (scenario). The standard modeling approach requires the model to be reconstructed from scratch with every time the setups changes. To address this issue, we propose a cross-scenario modeling approach that improves the workload’s performance classification accuracy by up to 78% through adopting the Transfer Learning (TL).<br></div><div>The storage system is the most crucial component of the big data ecosystem, where the workload’s execution process starts by fetching data from it and ends by storing data into it. Thus, the workload’s performance is directly affected by storage capability. To provide a high I/O performance in the ecosystems, Solid State Drive (SSD) are utilized as a tier or as a cache on big data distributed ecosystems. SSDs have a short lifespan that is affected by data size and the number of writing operations. Balancing performance requirements and SSD’s lifespan consumption is never easy, and it’s even harder when interacting with a huge amount of data and with heterogeneous I/O patterns. In this thesis, we analysis big data workloads I/O pattern impacts on SSD’s lifespan when SSD is used as a tier or as a cache. Then, we design a Hidden Markov Model (HMM) based I/O pattern controller that manages workload placement and guarantees cost-effective storage that enhances the workload performance by up to 60%, and improves SSD’s lifespan by up to 40%. </div><div>The designed transfer learning modeling approach and the storage management solutions improve workload modeling accuracy, and the quality of the storage management policies while the testing setup changes.<br></div>

show abstract

Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

Alsayoud¹

2021

Preprint

0

View full text Add to dashboard Cite

Big data ecosystems contain a mix of sophisticated hardware storage components to support heterogeneous workloads. Storage components and the workloads interact and affect each other; therefore, their relationship has to consider when modeling workloads or managing storage. Efficient workload modeling guides optimal storage management decisions, and the right decisions help guarantee the workload’s needs. The first part of this thesis focuses on workload modeling efficiency, and the second part focuses on cost-effective storage management.<div>Workload performance modeling is an essential step in management decisions. The standard modeling approach constructs the model based on a historical dataset collected from one set of setups (scenario). The standard modeling approach requires the model to be reconstructed from scratch with every time the setups changes. To address this issue, we propose a cross-scenario modeling approach that improves the workload’s performance classification accuracy by up to 78% through adopting the Transfer Learning (TL).<br></div><div>The storage system is the most crucial component of the big data ecosystem, where the workload’s execution process starts by fetching data from it and ends by storing data into it. Thus, the workload’s performance is directly affected by storage capability. To provide a high I/O performance in the ecosystems, Solid State Drive (SSD) are utilized as a tier or as a cache on big data distributed ecosystems. SSDs have a short lifespan that is affected by data size and the number of writing operations. Balancing performance requirements and SSD’s lifespan consumption is never easy, and it’s even harder when interacting with a huge amount of data and with heterogeneous I/O patterns. In this thesis, we analysis big data workloads I/O pattern impacts on SSD’s lifespan when SSD is used as a tier or as a cache. Then, we design a Hidden Markov Model (HMM) based I/O pattern controller that manages workload placement and guarantees cost-effective storage that enhances the workload performance by up to 60%, and improves SSD’s lifespan by up to 40%. </div><div>The designed transfer learning modeling approach and the storage management solutions improve workload modeling accuracy, and the quality of the storage management policies while the testing setup changes.<br></div>

show abstract

Fatimah Alsayoud

Cross-Scenario Performance Modelling for Big Data Ecosystems

SSD: Cache or Tier an Evaluation of SSD Cost and Efficiency using MapReduce

HMM Optimized Modeling of SSD Storage for I/O MapReduce Workloads

Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

Contact Info

Product

Resources

About