Structuring PLFS for extensibility

Cranor, Charles D.; Polte, Milo; Gibson, Garth A.

doi:10.1145/2538542.2538564

Cited by 6 publications

(6 citation statements)

References 18 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to achieve a maximum amount of local data reads, we employ the standard max-flow algorithm, Ford-Fulkerson [7], to compute the largest flow from s to t. The algorithm will iterate many times. In each iteration it increases the number of tasks/files assigned to processes.…”

Section: B Optimization Of Parallel Single-data Accessmentioning

confidence: 99%

“…In each iteration it increases the number of tasks/files assigned to processes. With the use of flow-augmenting paths [7], if a task t has been assigned to process i, but the overall size the graph's maximum matching could be increased by matching t with another process j, the assignment of t to i will be canceled and t is reassigned to j. Such a cancellation policy enables the assignments of processes on tasks to be optimal.…”

Section: B Optimization Of Parallel Single-data Accessmentioning

confidence: 99%

“…Such a cancellation policy enables the assignments of processes on tasks to be optimal. The formal proof can be found in [7]. The complexity of our implementation of task assignment is O(nE), where n is the number of files and E is the number of edges in Figure 4.…”

Section: B Optimization Of Parallel Single-data Accessmentioning

confidence: 99%

“…Because of this, tasks with multiple inputs will complicate the matching of processes to data. Such a matching problem is related to the stable marriage problem, which however only deals with one-to-one matching [7]. In this section, we propose a novel matching-based algorithm for this type of parallel data access.…”

Section: Optimization Of Parallel Multi-data Accessmentioning

confidence: 99%

See 3 more Smart Citations

Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems

Yin

Wang

Zhou

et al. 2015

2015 IEEE International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize Parallel Data Access on Distributed File Systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.

show abstract

Section: B Optimization Of Parallel Single-data Accessmentioning

confidence: 99%

Section: B Optimization Of Parallel Single-data Accessmentioning

confidence: 99%

Section: B Optimization Of Parallel Single-data Accessmentioning

confidence: 99%

Section: Optimization Of Parallel Multi-data Accessmentioning

confidence: 99%

See 2 more Smart Citations

Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems

Yin

Wang

Zhou

et al. 2015

2015 IEEE International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

show abstract

“…Many researches have been proposed to use the Hadoop system for parallel data processing. Gibson [3] and Sun [5] propose methods to write parallel data into HDFS and achieve high I/O performance. MRAP [8] is proposed to reconstruct scientific data according to data access patterns to assist data processing using the Hadoop system.…”

Section: Related Workmentioning

confidence: 99%

Optimize Parallel Data Access in Big Data Processing

Yin

Wang

2015

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

View full text Add to dashboard Cite

Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many parallel big data processing systems, such as graph processing systems, MPI-based parallel programs and scala/java-based Spark frameworks, which can efficiently support iterative and interactive data analysis in memory. The first part of my dissertation mainly focuses on studying parallel data access on distributed file systems, e.g, HDFS. Since the distributed I/O resources and global data distribution are often not taken into consideration, the data requests from parallel processes/executors will unfortunately be served in a remote or imbalanced fashion on the storage servers. In order to address these problems, we develop I/O middleware systems and matching-based algorithms to map parallel data requests to storage servers such that local and balanced data access can be achieved. The last part of my dissertation presents our plans to improve the performance of interactive data access in big data analysis. Specifically, most interactive analysis programs will scan through the entire data set regardless of which data is actually required. We plan to develop a content-aware method to quickly access required data without this laborious scanning process. 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing 978-1-4799-8006-2/15 $31.00

show abstract