Abstract:Since cloud computing provides computing resources on a pay per use basis, a task scheduling algorithm directly affects the cost for users. In this paper, we propose a novel cloud task scheduling algorithm based on ant colony optimization that allocates tasks of cloud users to virtual machines in cloud computing environments in an efficient manner. To enhance the performance of the task scheduler in cloud computing environments with ant colony optimization, we adapt diversification and reinforcement strategies… Show more
“…Hadoop Distributed File System (HDFS) [25] is a block-structured file system based on Google File System (GFS). Large files of over tens of terabytes can be stored in the distributed server, and storage can also be configured using the low-end server, which has advantages over existing large file systems (NAS, DAS, SAN, etc.).…”
IT technology and traditional industries have been combined recently, resulting in IT convergence technology in various fields. Through convergence with the automobile, pedestrian detection technology, in particular, is used in the autonomous navigation control service of autonomous vehicles and also applied in various fields such as intelligent CCTV and robot recognition technology. For pedestrian detection, hierarchical classification and feature vector were used in early stage, and deep learning is under active progress. However, since deep learning for pedestrian detection is timeconsuming for processing a large volume of image data, it requires a lot of computing resources, and hence building such a system is very expensive. Therefore, in this paper we shall present a distributed deep learning platform which can easily build a cluster, and execute deep learning process in the distributed cloud environment, while achieving performance improvement in various ways. Our platform provides a convenient interface for easily and efficiently executing the deep learning process in a distributed environment by providing a multilayered system architecture. Our system builds and utilizes computing power in easy and efficient way by leveraging container technique, so-called OS-level virtualization, rather than traditional hypervisor-based virtualization. In our system, we improve the whole performance by exploiting both of data and parameter parallelisms at once and reduce the synchronization overhead by exploiting asynchronous communication for parameter updates. Also, we propose an efficient resource allocation scheme for parameter servers and slaves which can improve the performance from the experiment.
“…Hadoop Distributed File System (HDFS) [25] is a block-structured file system based on Google File System (GFS). Large files of over tens of terabytes can be stored in the distributed server, and storage can also be configured using the low-end server, which has advantages over existing large file systems (NAS, DAS, SAN, etc.).…”
IT technology and traditional industries have been combined recently, resulting in IT convergence technology in various fields. Through convergence with the automobile, pedestrian detection technology, in particular, is used in the autonomous navigation control service of autonomous vehicles and also applied in various fields such as intelligent CCTV and robot recognition technology. For pedestrian detection, hierarchical classification and feature vector were used in early stage, and deep learning is under active progress. However, since deep learning for pedestrian detection is timeconsuming for processing a large volume of image data, it requires a lot of computing resources, and hence building such a system is very expensive. Therefore, in this paper we shall present a distributed deep learning platform which can easily build a cluster, and execute deep learning process in the distributed cloud environment, while achieving performance improvement in various ways. Our platform provides a convenient interface for easily and efficiently executing the deep learning process in a distributed environment by providing a multilayered system architecture. Our system builds and utilizes computing power in easy and efficient way by leveraging container technique, so-called OS-level virtualization, rather than traditional hypervisor-based virtualization. In our system, we improve the whole performance by exploiting both of data and parameter parallelisms at once and reduce the synchronization overhead by exploiting asynchronous communication for parameter updates. Also, we propose an efficient resource allocation scheme for parameter servers and slaves which can improve the performance from the experiment.
“…Figure 7 shows the RBD of the Google cluster. As it is illustrated in Figure 7, the reliability of the Google cluster is considered as a parallel system that can be calculated according to Equation (13) as follows:…”
Section: Reliability Assessment Of Google Clustermentioning
confidence: 99%
“…Cloud computing data centers offer thousands of physical servers networked via high bandwidth network infrastructures that communicate with one another to provide highly available and flexible services. 2 Although cloud computing services suggest many benefits, there are also many issues and open research problems to provide these services such as load balancing solutions, [3][4][5][6][7] security challenges, [8][9][10][11] task scheduling, [12][13][14] and high availability/reliability challenges. [15][16][17][18][19] The large-scale heterogeneity nature of cloud services leads to frequent failures in these systems.…”
Cloud solutions are emerging as a new suitable way of transforming traditional IT data centers to highly available and reliable computing resources for hosting critical applications and data. However, software and hardware failures are a common problem in cloud datacenters that can lead to harmful damages. In this paper, we analyze the physical server failures in the Google cloud datacenter. We study the Google cluster properties to investigate the relationship among physical servers' failure rate and jobs failure events. The failure rate of Google cluster executed jobs and servers is taken into consideration during a 29-day period. We present a reliability model for Google cluster physical machines using the continuous time Markov chains according to this observation. We attempt to analyze the obtained model through SHARPE software packages to improve the understanding of failure events in the Google cloud cluster. We also explore the cluster availability based on parameters like steady-state availability, steady-state unavailability, mean time to failure, and mean time to repair in the Google cluster.
“…The end-users vary from naive clients to expertised technicians. Cloud is a pool of resources shared among number of users [1]. Presently, in the world of cloud computing, it is the era of XaaS (Anything-as-a-Service) which means that the providers offer a wide variety of services [2,3].…”
IntroductionCloud computing is being used for innumerable applications these days. The end-users vary from naive clients to expertised technicians. Cloud is a pool of resources shared among number of users [1]. Presently, in the world of cloud computing, it is the era of XaaS (Anything-as-a-Service) which means that the providers offer a wide variety of services [2,3]. One of the most recent services provided through the cloud is high performance computing (HPC) environments for the complex applications. Virtualization is the technology which enables users to share a single entity among a group of users.
AbstractCloud computing is the driving power behind the current technological era. Virtualization is rightly referred to as the backbone of cloud computing. Impacts of virtualization employed in high performance computing (HPC) has been much reviewed by researchers. The overhead in the virtualization layer was one of the reasons which hindered its application in the HPC environment. Recent developments in virtualization, especially the OS container based virtualization provides a solution that employs a lightweight virtualization layer and promises lesser overhead. Containers are advantageous over virtual machines in terms of performance overhead which is a major concern in the case of both data intensive applications and compute intensive applications. Currently, several industries have adopted container technologies such as Docker. While Docker is widely used, it has certain pitfalls such as security issues. The recently introduced CoreOS Rkt container technology overcomes these shortcomings of Docker. There has not been much research on how the Rkt environment is suited for high performance applications. The differences in the stack of the Rkt containers suggest better support for high performance applications. High performance applications consist of CPU-intensive and data-intensive applications. The High Performance Linpack Library and the Graph500 are the commonly used computation intensive and data-intensive benchmark applications respectively. In this work, we explore the feasibility of this inter-operable Rkt container in high performance applications by running the HPL and Graph500 applications and compare its performance with the commonly used container technologies such as LXC and Docker containers. Martin et al. Hum. Cent. Comput. Inf. Sci. (2018) Hum. Cent. Comput. Inf. Sci. (2018) 8:1 Based on the position of the virtualization layer, virtualization can be of different types like full virtualization, paravirtualization and OS level virtualization.
RESEARCHTraditional HPC clusters are composed of many separate dedicated servers called nodes and may be shared among different organizations. The requirements of each user or organization will be different, which demands the creation of customized environments without affecting others. This is not an easy task in traditional HPC systems. As a solution for this, virtualization was adopted for HPC. Virtualization materializes the task by creating s...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.