Abstract-Augmenting a processor with special hardware that is able to apply a Single Instruction to Multiple Data (SIMD) at the same time is a cost effective way of improving processor performance. It also offers a means of improving the ratio of processor performance to power usage due to reduced and more effective data movement and intrinsically lower instruction counts.This paper considers and compares the NEON SIMD instruction set used on the ARM Cortex-A series of RISC processors with the SSE2 SIMD instruction set found on Intel platforms within the context of the Open Computer Vision (OpenCV) library. The performance obtained using compiler auto-vectorization is compared with that achieved using hand-tuning across a range of five different benchmarks and ten different hardware platforms. On the ARM platforms the hand-tuned NEON benchmarks were between 1.05× and 13.88× faster than the auto-vectorized code, while for the Intel platforms the hand-tuned SSE benchmarks were between 1.34× and 5.54× faster.
In the construction of exascale computing systems energy efficiency and power consumption are two of the major challenges. Low-power high performance embedded systems are of increasing interest as building blocks for large scale highperformance systems. However, extracting maximum performance out of such systems presents many challenges. Various aspects from the hardware architecture to the programming models used need to be explored. The Epiphany architecture integrates low-power RISC cores on a 2D mesh network and promises up to 70 GFLOPS/Watt of processing efficiency. However, with just 32 KB of memory per eCore for storing both data and code, and only low level inter-core communication support, programming the Epiphany system presents several challenges. In this paper we evaluate the performance of the Epiphany system for a variety of basic compute and communication operations. Guided by this data we explore strategies for implementing scientific applications on memory constrained low-powered devices such as the Epiphany. With future systems expected to house thousands of cores in a single chip, the merits of such architectures as a path to exascale is compared to other competing systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.