Carlos Álvarez Martínez scite author profile

Abstract-Instruction memoization is a promising technique to reduce the power consumption and increase the performance of future low-end/mobile multimedia systems. Power and performance efficiency can be improved by reusing instances of an already executed operation. Unfortunately, this technique may not always be worth the effort due to the power consumption and area impact of the tables required to leverage an adequate level of reuse. In this paper, we introduce and evaluate a novel way of understanding multimedia floating-point operations based on the fuzzy computation paradigm: Performance and power consumption can be improved at the cost of small precision losses in computation. By exploiting this implicit characteristic of multimedia applications, we propose a new technique called tolerant memoization. This technique expands the capabilities of classic memoization by associating entries with similar inputs to the same output. We evaluate this new technique by measuring the effect of tolerant memoization for floating-point operations in a low-power multimedia processor and discuss the trade-offs between performance and quality of the media outputs. We report energy improvements of 12 percent for a set of key multimedia applications with small LUT of 6 Kbytes, compared to 3 percent obtained using previously proposed techniques.Index Terms-Low-power design, special-purpose and application-based systems, real-time and embedded systems.

show abstract

A Voxel-Based Analysis of FDG-PET in Traumatic Brain Injury: Regional Metabolism and Relationship between the Thalamus and Cortical Areas

García-Panach

Lull

et al. 2011

Journal of Neurotrauma

View full text Add to dashboard Cite

patients with different neurological outcomes. Methods: We studied 49 patients who had suffered a severe TBI and 10 healthy control subjects using 18F-FDG-PET. The patients were divided into three groups: the MCS&VS group (n=17), which included patients who were in a vegetative or a minimally conscious state; the In-PTA group (n=12), which included patients in post-traumatic amnesia (PTA); and the Out-PTA group (n=20), which included patients who had recovered from PTA. SPM5 software was used to determine the metabolic differences between the groups. FDG-PET images were normalized and four regions of interest were generated around the thalamus, precuneus and the frontal and temporal lobes. The groups were parameterized using the Student's T-test. Principal component analysis was used to obtain an intensityestimated-value per subject to correlate the function between the structures.Results: Differences in glucose metabolism in all structures were related to the neurological outcome, and the most severe patients showed the most severe hypometabolism. We also found a significant correlation between the cortico-thalamocortical metabolism in all groups. Conclusions: Voxel-based analysis suggests a functional correlation between these four areas and decreased metabolism was associated with less favorable outcome. Higher levels of activation of the corticocortical connections appear to be related to better neurological conditions. Differences in the thalamo-cortical correlations between patients and controls may be related to traumatic dysfunction due to focal or diffuse lesions.3

show abstract

General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models

Tan

Bosch

Vidal

et al. 2017

View full text Add to dashboard Cite

Hybrid Dataflow/von-Neumann Architectures

Yazdanpanah

Martínez

Jiménez-González

et al. 2014

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Estudio prospectivo de 1.000 pacientes consecutivos con derrame pleural. Etiología del derrame y características de los pacientes

Villena

Encuentra

Echave-Sustaeta

et al. 2002

Archivos de Bronconeumología

View full text Add to dashboard Cite

A Hardware Runtime for Task-Based Programming Models

Tan

Bosch

Martínez

et al. 2019

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only runtimes introduce relevant overhead when targeting fine-grained tasks, resulting in performance losses. To overcome this drawback, we present a hardware runtime Picos++ that accelerates critical runtime functions such as task dependence analysis, nested task support, and heterogeneous task scheduling. As a proof-of-concept, the Picos++ hardware runtime has been integrated with a compiler infrastructure that supports parallel task-based programming models. A FPGA SoC running Linux OS has been used to implement the hardware accelerated part of Picos++, integrated with a heterogeneous system composed of 4 symmetric multiprocessor (SMP) cores and several hardware functional accelerators (HwAccs) for task execution. Results show significant improvements on energy and performance compared to state-of-the-art parallel software-only runtimes. With Picos++, applications can achieve up to 7.6x speedup and save up to 90% of energy, when using 4 threads and up to 4 HwAccs, and even reach a speedup of 16x over the software alternative when using 12 HwAccs and small tasks.

show abstract

Analysis of the Task Superscalar Architecture Hardware Design

Yazdanpanah

Jiménez-González

Martínez

et al. 2013

Procedia Computer Science

View full text Add to dashboard Cite

Application Acceleration on FPGAs with OmpSs@FPGA

Bosch

Tan

Filgueras

et al. 2018

View full text Add to dashboard Cite

OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous execution, we use OmpSs and OmpSs@FPGA as prototype implementation to develop new ideas for OpenMP. OmpSs@FPGA implements the tasking model with runtime support to automatically exploit all SMP and FPGA resources available in the execution platform. In this paper, we present the OmpSs@FPGA ecosystem, based on the Mercurium compiler and the Nanos++ runtime system. We show how the applications are transformed to run on the SMP cores and the FPGA. The application kernels defined as tasks to be accelerated, using the OmpSs directives are: 1) transformed by the compiler into kernels connected with the proper synchronization and communication ports, 2) extracted to intermediate files, 3) compiled through the FPGA vendor HLS tool, and 4) used to configure the FPGA. Our Nanos++ runtime system schedules the application tasks on the platform, being able to use the SMP cores and the FPGA accelerators at the same time. We present the evaluation of the OmpSs@FPGA environment with the Matrix Multiplication, Cholesky and N-Body benchmarks, showing the internal details of the execution, and the performance obtained on a Zynq Ultrascale+ MPSoC (up to 128x). The source code uses OmpSs@FPGA annotations and different Vivado HLS optimization directives are applied for acceleration.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.