Abstract:Abstract-Cloud computing has drawn increasing attention from the scientific computing community due to its ease of use, elasticity, and relatively low cost. Because a high-performance computing (HPC) application is usually resource demanding, without careful planning, it can incur a high monetary expense even in Cloud. We design a tool called CAP 3 (Cloud AutoProvisioning framework for Parallel Processing) to help a user minimize the expense of running an HPC application in Cloud, while meeting the user-specif… Show more
“…Meanwhile, more efficient selection algorithms will be investigated. In addition, we will adapt the framework to generate fault handling strategies for cloud services by combining our works [17], [18], [19], [20] in the fields of service computing and cloud computing.…”
Resilience is an important factor in designing web service oriented systems due to frequent failures arising in runtime. These failures derive from the stochastic and uncertainty nature of a composite web service. Service providers need to rapidly address issue when a fault occurs in system running. But it is not easy to locate and fix the faults only using the log generated by the system. In this paper, we propose a resilient framework to automatically generate a fault handling strategy for each failed service to improve the efficiency of fault handling. In the framework, we design and implement three components including exception analyzer, decision maker, and strategy selector. First, The exception analyzer builds a record, derived from the system log generated by an application, for each failed service. Next, the decision maker adopts a k-means clustering approach to construct a decision including the fault handling to each failed service in a scope. Then, the strategy selector uses an integer program solver to generate the solution to strategy selection problem that is boiled down to the optimization problem. The experiment shows that the framework can improve resilience of Web service-oriented systems under acceptable overheads, and meanwhile the accuracy of fault handling strategy is over 95% .
“…Meanwhile, more efficient selection algorithms will be investigated. In addition, we will adapt the framework to generate fault handling strategies for cloud services by combining our works [17], [18], [19], [20] in the fields of service computing and cloud computing.…”
Resilience is an important factor in designing web service oriented systems due to frequent failures arising in runtime. These failures derive from the stochastic and uncertainty nature of a composite web service. Service providers need to rapidly address issue when a fault occurs in system running. But it is not easy to locate and fix the faults only using the log generated by the system. In this paper, we propose a resilient framework to automatically generate a fault handling strategy for each failed service to improve the efficiency of fault handling. In the framework, we design and implement three components including exception analyzer, decision maker, and strategy selector. First, The exception analyzer builds a record, derived from the system log generated by an application, for each failed service. Next, the decision maker adopts a k-means clustering approach to construct a decision including the fault handling to each failed service in a scope. Then, the strategy selector uses an integer program solver to generate the solution to strategy selection problem that is boiled down to the optimization problem. The experiment shows that the framework can improve resilience of Web service-oriented systems under acceptable overheads, and meanwhile the accuracy of fault handling strategy is over 95% .
“…There is a considerable amount of research works addressing cloud resource provisioning and scheduling from the user or consumer perspective . Some authors have studied how to implement hybrid provisioning of resources between several cloud providers, or even between different computing infrastructures such as grids and clouds .…”
SummaryCloud computing has permeated into the IT industry in the last few years, and it is nowadays emerging in scientific environments. Science user communities are demanding a broad range of computing power to satisfy high-performance applications needs, such as local clusters, High Performance Computing (HPC) systems and computing grids. Different workloads need from different computational models, and the cloud is already considered as a promising paradigm.The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads, hence their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service IaaS provider supporting scientific applications.
“…(4) PIY optimizes the network traffic by decreasing the amount of the transmitted data located on nodes acting as both Mapper and Reducers. (5) We conduct a performance evaluation with PIY in YARN (Hadoop 2.6.0). Compared with some other popular strategies, PIY can reduce the execution time by 35.62% and 50.65% in homogeneous and heterogeneous Hadoop cluster, respectively.…”
Section: Hash(hashcode(intermediate Data) Mod Reducern Um)mentioning
confidence: 99%
“…In addition, many DataNodes act as both Mapper and Reducer [12]. If we can stay as many as intermediate <key,value> pairs on these DataNodes by the partition method in shuffle phase, it also furthest decrease the network traffic [5]. It's assumed that there are many <key,value> pairs corresponding to a special key on those DataNodes simultaneously.…”
Section: Network Traffic In Shuffle Phrasementioning
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence the performance of MapReduce applications. However, the Hash-Partitioner in native Hadoop does not consider them. This paper proposes a new partitioner in Yarn (Hadoop 2.6.0), namely, PIY, which adopts an innovative parallel sampling method to achieve the distribution of the intermediate data. Based on this, firstly, PIY mitigates data skew in MapReduce applications. Secondly, PIY considers the heterogeneity of the computing resource to balance the load among Reducers. Thirdly, PIY reduces the network traffic in shuffle phase by trying to retain intermediate data on those nodes who act as both mapper and reducer. Compared with the native Hadoop and some other popular strategies, PIY can reduce the execution time by 35.62% and 50.65% in homogeneous and heterogeneous cluster, respectively. We also implement PIY in parallel image processing. Compared with several existing strategies, PIY can reduce the execution time by 11.2%
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.