Marcio Merino Fernandes scite author profile

Llosa

Topham

This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machine. A distinctive characteristic of this architecture is the use of register files organized by means of queues, which results in a number of advantages over conventional schemes, but also requires the development of specific compiling and hardware features. We have investigated a scheme based on copy operations to deal with data values to be consumed more than once during loop execution. Experiments with loop unrolling were also pelformed in order to optimize both loop execution and the use of machine resources. A partitioning algorithm has been implemented to perjorm some experiments with the clustered architecture model, an organization widely accepted as being essential for very wide issue machines. '

Distributed modulo scheduling

Llosa

Topham³

1999

Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units. Organizations composed by clusters of a few functional units and small private register files have been proposed to deal with this problem, an approach highly dependent on scheduling and partitioning strategies. This paper presents DMS, an algorithm that integrates modulo scheduling and code partitioning in a single procedure. Experimental results have shown the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops.

LALP: A Language to Program Custom FPGA-Based Acceleration Engines

Menotti

Cardoso

et al. 2011

Int J Parallel Prog

Field-Programmable Gate Arrays (FPGAs) are becoming increasingly important in embedded and high-performance computing systems. They allow performance levels close to the ones obtained with Application-Specific Integrated Circuits, while still keeping design and implementation flexibility. However, to efficiently program FPGAs, one needs the expertise of hardware developers in order to master hardware description languages (HDLs) such as VHDL or Verilog. Attempts to furnish a high-level compilation flow (e.g., from C programs) still have to address open issues before broader efficient results can be obtained. Bearing in mind an FPGA available resources, it has been developed LALP (Language for Aggressive Loop Pipelining), a novel language to program FPGA-based accelerators, and its compilation framework, including mapping capabilities. The main ideas behind LALP are to provide a higher abstraction level than HDLs, to exploit the intrinsic parallelism of hardware resources, and to allow the programmer to control execution stages whenever the compiler techniques are unable to generate efficient implementations. Those features are particularly useful to implement loop pipelining, a well regarded technique used to R. Menotti (B) · M. M. Fernandes accelerate computations in several application domains. This paper describes LALP, and shows how it can be used to achieve high-performance computing solutions.

A Mersenne Twister Hardware Implementation for the Monte Carlo Localization Algorithm

Bonato

Mazzotti

et al. 2012

J Sign Process Syst

Practical Education Fostered by Research Projects in an Embedded Systems Course

Bonato

International Journal of Reconfigurable Computing

Cardoso

et al. 2014

The very nature of universities makes them unique environments for research and teaching. Although both activities constantly borrow from each other, a deeper level of interaction is not always achieved for several reasons. This paper presents a successful experience on conducting an undergraduate course on embedded systems, based on strong interaction with related research activities previously conducted by the authors. Known for being everywhere, embedded systems are constantly expanding in both complexity and volume production. In addition, heterogeneous systems are becoming prevalent in modern applications, standing as an additional difficulty to students in this area. In this context, this paper presents experiences in teaching embedded systems using a project-based learning pedagogical approach, with strong emphasis on mobile robotic applications previously developed by MSc and PhD students. As a result, it has been observed that undergraduate students have the opportunity to build a strong background and feel better prepared to face the challenges to be found in their future professional activities.