Guillem Ramirez-Gargallo scite author profile

Guillem Ramirez-Gargallo

5Publications

24Citation Statements Received

135Citation Statements Given

How they've been cited

How they cite others

134

Affiliations

Barcelona Supercomputing Center

Publications

Order By: Most citations

TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case

Ramirez-Gargallo

García-Gasulla

Mantovani

2019

View full text Add to dashboard Cite

The recent rapid growth of the data-flow programming paradigm enabled the development of specific architectures, e.g., for machine learning. The most known example is the Tensor Processing Unit (TPU) by Google. Standard data-centers, however, still can not foresee large partitions dedicated to machine learning specific architectures. Within data-centers, the High-Performance Computing (HPC) clusters are highly parallel machines targeting a broad class of compute-intensive workflows, as such they can be used for tackling machine learning challenges. On top of this, HPC architectures are rapidly changing, including accelerators and instruction sets other than the classical x86 CPUs. In this blurry scenario, identifying which are the best hardware/software configurations to efficiently support machine learning workloads on HPC clusters is not trivial. In this paper, we considered the workflow of TensorFlow for image recognition. We highlight the strong dependency of the performance in the training phase on the availability of arithmetic libraries optimized for the underlying architecture. Following the example of Intel leveraging the MKL libraries for improving the TensorFlow performance, we plugged the Arm Performance Libraries into TensorFlow and tested on an HPC cluster based on Marvell ThunderX2 CPUs. Also, we performed a scalability study on three state-of-the-art HPC clusters based on different CPU architectures, x86 Intel Skylake, Arm-v8 Marvell ThunderX2, and PowerPC IBM Power9.

show abstract

A Generic Performance Analysis Technique Applied to Different CFD Methods for HPC

García-Gasulla

Banchelli

Peiro

et al. 2020

International Journal of Computational Fluid Dynamics

View full text Add to dashboard Cite

For complex engineering and scientific applications, Computational Fluid Dynamics (CFD) simulations require a huge amount of computational power. As such, it is of paramount importance to carefully assess the performance of CFD codes and to study them in depth for enabling optimisation and portability. In this paper, we study three complex CFD codes, OpenFOAM, Alya and CHORUS representing two numerical methods, namely the finite volume and finite-element methods, on both structured and unstructured meshes. To all codes, we apply a generic performance analysis method based on a set of metrics helping the code developer in spotting the critical points that can potentially limit the scalability of a parallel application. We show the root cause of the performance bottlenecks studying the three applications on the MareNostrum4 supercomputer. We conclude providing hints for improving the performance and the scalability of each application.

show abstract

Performance study of HPC applications on an Arm-based cluster using a generic efficiency model

Banchelli

Peiro

Querol

et al. 2020

View full text Add to dashboard Cite

HPC systems and parallel applications are increasing their complexity. Therefore the possibility of easily study and project at large scale the performance of scientific applications is of paramount importance. In this paper we describe a performance analysis method and we apply it to four complex HPC applications. We perform our study on a pre-production HPC system powered by the latest Arm-based CPUs for HPC, the Marvell ThunderX2. For each application we spot inefficiencies and factors that limit their scalability. The results show that in several cases the bottlenecks do not come from the hardware but from the way applications are programmed or the way the system software is configured.

show abstract

Asymmetric HMMs for Online Ball-Bearing Health Assessments

Puerto-Santana¹,

Bielza

Díaz-Rozo³

et al. 2022

IEEE Internet Things J.

View full text Add to dashboard Cite

The degradation of critical components inside large industrial assets, such as ball-bearings, has a negative impact on production facilities, reducing the availability of assets due to an unexpectedly high failure rate. Machine learningbased monitoring systems can estimate the remaining useful life (RUL) of ball-bearings, reducing the downtime by early failure detection. However, traditional approaches for predictive systems require run-to-failure (RTF) data as training data, which in real scenarios can be scarce and expensive to obtain as the expected useful life could be measured in years. Therefore, to overcome the need of RTF, we propose a new methodology based on online novelty detection and asymmetrical hidden Markov models (As-HMM) to work out the health assessment. This new methodology does not require previous RTF data and can adapt to natural degradation of mechanical components over time in data-stream and online environments. As the system is designed to work online within the electrical cabinet of machines it has to be deployed using embedded electronics. Therefore, a performance analysis of As-HMM is presented to detect the strengths and critical points of the algorithm. To validate our approach, we use real life ball-bearing data-sets and compare our methodology with other methodologies where no RTF data is needed and check the advantages in RUL prediction and health monitoring. As a result, we showcase a complete end-to-end solution from the sensor to actionable insights regarding RUL estimation towards maintenance application in real industrial environments.

show abstract

Cluster of emerging technology: evaluation of a production HPC system based on A64FX

Banchelli

Peiro

Ramirez-Gargallo

et al. 2021

View full text Add to dashboard Cite

Clusters of emerging technologies are appearing with more and more frequency in HPC. After years of skepticism, data-centers are adopting them as production systems thanks to several geopolitical and technological factors. The most honorable example is the Fugaku supercomputer, powered by the latest Fujitsu A64FX CPU. Which is the behavior of mature HPC codes on such emerging technology clusters? Which performance will obtain scientists when running their HPC applications "as is" on these clusters? This paper presents the evaluation of CTE-Arm, a Fugaku-like system, including both fine-tuned microbenchmarks and five scientific applications run without prior fine-tuning: Alya, NEMO, Gromacs, OpenIFS, and WRF. Results show that while micro-architectural benchmarks show performance as expected, the performance obtained running HPC applications not tuned for a specific architecture are between 2× and 4× slower compared with a standard Intel-based HPC system. Therefore further effort is needed to improve tools (e.g., compilers) and system software (e.g., MPI libraries) to ease applications deployment and improve their performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.