Federico Silla scite author profile

Registro de acceso restringido Este recurso no está disponible en acceso abierto por política de la editorial. No obstante, se puede acceder al texto completo desde la Universitat Jaume I o si el usuario cuenta con suscripción. Registre d'accés restringit Aquest recurs no està disponible en accés obert per política de l'editorial. No obstant això, es pot accedir al text complet des de la Universitat Jaume I o si l'usuari compta amb subscripció. Restricted access item This item isn't open access because of publisher's policy. The full--text version is only available from Jaume I University or if the user has a running suscription to the publisher's contents.

show abstract

Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing

Rodrigo

Flich

Roca

et al. 2010

View full text Add to dashboard Cite

The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge.In this paper, uLBDR (Universal Logic-Based Distributed Routing) is proposed as an efficient logic-based mechanism that adapts to any irregular topology derived from 2D meshes, being an alternative to the use of routing tables (either at routers or at end-nodes). uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the trade-off between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the trade-off between fault tolerance and performance

show abstract

An Efficient Implementation of GPU Virtualization in High Performance Clusters

Duato

Igual

Mayo

et al. 2010

View full text Add to dashboard Cite

Current high performance clusters are equipped with high bandwidth/low latency networks, lots of processors and nodes, very fast storage systems, etc. However, due to economical and/or power related constraints, in general it is not feasible to provide an accelerating coprocessor -such as a graphics processor (GPU)-per node. To overcome this, in this paper we present a GPU virtualization middleware, which makes remote CUDA-compatible GPUs available to all the cluster nodes. The software is implemented on top of the sockets application programming interface, ensuring portability over commodity networks, but it can also be easily adapted to high performance networks.

show abstract

High-performance routing in networks of workstations with irregular topology

Silla

Duato

2000

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

ÐNetworks of workstations are rapidly emerging as a cost-effective alternative to parallel computers. Switch-based interconnects with irregular topology allow the wiring flexibility, scalability, and incremental expansion capability required in this environment. However, the irregularity also makes routing and deadlock avoidance on such systems quite complicated. In current proposals, many messages are routed following nonminimal paths, increasing latency and wasting resources. In this paper, we propose two general methodologies for the design of adaptive routing algorithms for networks with irregular topology. Routing algorithms designed according to these methodologies allow messages to follow minimal paths in most cases, reducing message latency and increasing network throughput. As an example of application, we propose two adaptive routing algorithms for AN1 (previously known as Autonet). They can be implemented either by duplicating physical channels or by splitting each physical channel into two virtual channels. In the former case, the implementation does not require a new switch design. It only requires changing the routing tables and adding links in parallel with existing ones, taking advantage of spare switch ports. In the latter case, a new switch design is required, but the network topology is not changed. Evaluation results for several different topologies and message distributions show that the new routing algorithms are able to increase throughput for random traffic by a factor of up to 4 with respect to the original up Ã /down Ã algorithm, also reducing latency significantly. For other message distributions, throughput is increased more than seven times. We also show that most of the improvement comes from the use of minimal routing.

show abstract

Enabling CUDA acceleration within virtual machines using rCUDA

Duato

Peña

Silla

et al. 2011

View full text Add to dashboard Cite

The hardware and software advances of Graphics Processing Units (GPUs) have favored the development of GPGPU (General-Purpose Computation on GPUs) and its adoption in many scientific, engineering, and industrial areas. Thus, GPUs are increasingly being introduced in high-performance computing systems as well as in datacenters. On the other hand, virtualization technologies are also receiving rising interest in these domains, because of their many benefits on acquisition and maintenance savings. There are currently several works on GPU virtualization. However, there is no standard solution allowing access to GPGPU capabilities from virtual machine environments like, e.g., VMware, Xen, VirtualBox, or KVM. Such lack of a standard solution is delaying the integration of GPGPU into these domains. In this paper, we propose a first step towards a general and open source approach for using GPGPU features within VMs. In particular, we describe the use of rCUDA, a GPGPU (General-Purpose Computation on GPUs) virtualization framework, to permit the execution of GPU-accelerated applications within virtual machines (VMs), thus enabling GPGPU capabilities on any virtualized environment. Our experiments with rCUDA in the context of KVM and VirtualBox on a system equipped with two NVIDIA GeForce 9800 GX2 cards illustrate the overhead introduced by the rCUDA middleware and prove the feasibility and scalability of this general virtualizing solution. Experimental results show that the overhead is proportional to the dataset size, while the scalability is similar to that of the native environment.

show abstract

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand

Reaño

Silla

Shainer

et al. 2015

View full text Add to dashboard Cite

The use of graphics processing units (GPUs) to accelerate some portions of applications is widespread nowadays. To avoid the usual inconveniences associated with these accelerators (high acquisition cost, high energy consumption, and low utilization), one possible solution is sharing them among several nodes in the cluster. Several years ago, remote GPU virtualization middleware systems appeared to implement this solution. Although these systems tackled the aforementioned inconveniences, their performance was usually impaired by the low bandwidth attained by the underlying network. However, the recent advances in InfiniBand fabrics have changed this trend. In this paper we analyze how the high bandwidth provided by the new EDR 100G Infini-Band fabric allows remote GPU virtualization middleware systems not only to perform very similar to local GPUs, but also to improve overall performance for some applications.

show abstract

Improving the efficiency of adaptive routing in networks with irregular topology

Silla

Duato²

View full text Add to dashboard Cite

Networks of workstations are emerging as a costeffective alternative to parallel computers. The interconnection between workstations usually relies on switchbased networks with irregular topologies. This irregularity makes routing and deadlock avoidance quite complicated. Current proposals avoid deadlock by removing cyclic dependencies between channels and therefore, many messages are routed along non-minimal paths, increasing latency and wasting resources.In this papel; we propose a general methodology for the design of adaptive routing algorithms for networks with irregular topology that improves over a previously proposed one by reducing the probability of routing over nonminimal paths. The resulting routing algorithms allow messages to follow minimal paths in most cases, reducing message latency and increasing network throughput. As an example of application, we propose an improved adaptive routing algorithm for Autonet.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Federico Silla

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

A complete and efficient CUDA-sharing solution for HPC clusters

Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing

An Efficient Implementation of GPU Virtualization in High Performance Clusters

High-performance routing in networks of workstations with irregular topology

Enabling CUDA acceleration within virtual machines using rCUDA

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand

Improving the efficiency of adaptive routing in networks with irregular topology

Contact Info

Product

Resources

About