Mauricio Alvarez Mesa scite author profile

An important question is whether emerging and future applications exhibit sufficient parallelism, in particular thread-level parallelism, to exploit the large numbers of cores future chip multiprocessors (CMPs) are expected to contain. As a case study we investigate the parallelism available in video decoders, an important application domain now and in the future. Specifically, we analyze the parallel scalability of the H.264 decoding process. First we discuss the data structures and dependencies of H.264 and show what types of parallelism it allows to be exploited. We also show that previously proposed parallelization strategies such as slice-level, frame-level, and intra-frame macroblock (MB) level parallelism, are not sufficiently scalable. Based on the observation that inter-frame dependencies have a limited spatial range we propose a new parallelization strategy, called Dynamic 3D-Wave. It allows certain MBs of consecutive frames to be decoded in parallel. Using this new strategy we analyze the limits to the available MB-level parallelism in H.264. Using real movie sequences we find a maximum MB parallelism ranging from 4000 to 7000. We also perform a case study to assess the practical value and possibilities of a highly parallelized H.264 application. The results show that H.264 exhibits sufficient parallelism to efficiently exploit the capabilities of future manycore CMPs.

show abstract

The SARC Architecture

Ramírez

Cabarcas

Juurlink

et al. 2010

IEEE Micro

View full text Add to dashboard Cite

The SARC architecture is composed of multiple processor types and a set of user-managed direct memory access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARC's programming model supports various highly parallel applications, with matching support from specialized accelerator processors. On-chip parallel computation shows great promise for scaling raw processing performance within a given power budget. However, chip multiprocessors (CMPs) often struggle with programmability and scalability issues such as cache coherency and off-chip memory bandwidth and latency.

show abstract

Scalability of Macroblock-level Parallelism for H.264 Decoding

Mesa

Ramírez

Azevedo

et al. 2009

View full text Add to dashboard Cite

Abstract-This paper investigates the scalability of MacroBlock (MB) level parallelization of the H.264 decoder for High Definition (HD) applications. The study includes three parts. First, a formal model for predicting the maximum performance that can be obtained taking into account variable processing time of tasks and thread synchronization overhead. Second, an implementation on a real multiprocessor architecture including a comparison of different scheduling strategies and a profiling analysis for identifying the performance bottlenecks. Finally, a trace-driven simulation methodology has been used for identifying the opportunities of acceleration for removing the main bottlenecks. It includes the acceleration potential for the entropy decoding stage and thread synchronization and scheduling. Our study presents a quantitative analysis of the main bottlenecks of the application and estimates the acceleration levels that are required to make the MB-level parallel decoder scalable.

show abstract

An evaluation of current SIMD programming models for C++

Pohl

Cosenza

Mesa

et al. 2016

View full text Add to dashboard Cite

SIMD extensions were added to microprocessors in the mid '90s to speed-up data-parallel code by vectorization. Unfortunately, the SIMD programming model has barely evolved and the most efficient utilization is still obtained with elaborate intrinsics coding. As a consequence, several approaches to write efficient a nd portable SIMD code have been proposed. In this work, we evaluate current programming models for the C++ language, which claim to simplify SIMD programming while maintaining high performance. The proposals were assessed by implementing two kernels: one standard floating-point benchmark and one real-world integerbased application, both highly data parallel. Results show that the proposed solutions perform well for the floating p oint kernel, achieving close to the maximum possible speed-up. For the real-world application, the programming models exhibit significant performance gaps due to data type issues, missing template support and other problems discussed in this paper.

show abstract

PSNC advanced multimedia and visualization infrastructures, services and applications

Kurowski¹,

Glowiak²,

Ludwiczak³

et al. 2018

View full text Add to dashboard Cite

PSNC advanced multimedia and visualization infrastructures, services and applications

Kurowski

Glowiak

Ludwiczak

et al. 2018

View full text Add to dashboard Cite

Poznan Supercomputing and Networking Services (PSNC) offers advanced visualisation and multimedia infrastructures as a set of dedicated laboratories to conduct innovative research and development projects involving both academia and industry. In this short overview we present the existing facilities located at the PSNC campus in Poznan, Poland as well as short descriptions of example applications and networked services which have been recently developed. CCS CONCEPTS • Human-centered computing → Visualization; • Computing methodologies → Graphics systems and interfaces;

show abstract

Instrumentación para captura y transmisión de señales de vibración

Hernández

Echeverry²,

Mesa³

et al. 2018

Vis. Electron.

View full text Add to dashboard Cite

Las señales de vibración son usadas generalmente para detectar fallos en máquinas rotativas. En la actualidad existen diferentes metodologías para realizar análisis basado en dichas señales. Una metodología usada extensamente es el Mantenimiento Basado en Condición (CBM). CBM es un mantenimiento programado que recomienda acciones basadas en información recolectada. Actualmente, para la adquisición de señales de vibración se usan comúnmente Redes de Sensores Inalámbricos (WSNs por sus siglas en ingles). Los WSNs son redes formadas por una cierta cantidad de nodos, cada nodo está equipado con un sensor para identificar un fenómeno físico como la luz, presión, temperatura, etc. En este artículo, se propone un robusto sistema basado en WSNs para la adquisición, almacenamiento y transmisión de señales de vibración, el cual combina un mecanismo de acondicionamiento, una tarjeta central y un dispositivo para la transmisión inalámbrica. El sistema propuesto cumple todas las funciones anteriores de manera automática y precisa para dos señales de vibración y una señal de velocidad.

show abstract

A Technology for Building Web-Based Laboratories for Teaching Electronics

Arboleda

Mesa

Cobo

2003

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.