Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential to enabling the use of multi-and many-core processor architectures. However, hardware architectures and vector ISAs are also shifting and diversifying quickly, making it difficult for a single binary to run well on all possible targets. Because of this, retargetability and dynamic compilation are of growing relevance. This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework. This system focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++. ArBB interoperates with other programming models to help meet the demands we hear from customers for a solution with both greater programmer productivity and good performance.This work makes contributions in language features, compiler architecture, code transformations and optimizations. It presents performance data from the current beta release of ArBB and quantitatively shows the impact of some key analyses, enabling transformations and optimizations for a variety of benchmarks that are of interest to our customers.
A novel planar monopole antenna with dual‐band circularly polarisation (CP) is presented here. The antenna consists of a modified L‐shaped monopole, an inverted‐L strip, and a modified ground. The utilisation of the modified L‐shaped monopole not only generates dual‐band operation, but also excites CP radiation in the lower band. In order to achieve CP in the upper band, a parasitic inverted‐L strip is added on the top plane of the substrate and a modified ground is designed. The proposed antenna has been fabricated and tested, the measured −10 dB reflection coefficient bandwidths are 370 MHz (2.38–2.85 GHz) in the lower band and 2330 MHz (4.05–6.38 GHz) in the upper band. The measured corresponding 3 dB axial ratio (AR) bandwidths are 610 MHz (2.39–3 GHz) and 850 MHz (5.15–6 GHz), respectively. The overlapped −10 dB reflection coefficient and AR bandwidths can totally cover the WLAN bands, Wi‐Fi bands, Bluetooth band, and partly cover the WiMAX bands. The proposed antenna owns bidirectional radiation characteristic and reasonable gain at both the lower band and the upper band.
As parallelism in microprocessors becomes mainstream, new programming languages and environments are emerging to meet the challenges of parallel programming. To support research on these languages, we are developing a lowlevel language infrastructure called Pillar (derived from Parallel Implementation Language). Although Pillar programs are intended to be automatically generated from source programs in each parallel language, Pillar programs can also be written by expert programmers. The language is defined as a small set of extensions to C. As a result, Pillar is familiar to C programmers, but more importantly, it is practical to reuse an existing optimizing compiler like gcc [1] or Open64 [2] to implement a Pillar compiler. Pillar's concurrency features include constructs for threading, synchronization, and explicit data-parallel operations. The threading constructs focus on creating new threads only when hardware resources are idle, and otherwise executing parallel work within existing threads, thus minimizing thread creation overhead. In addition to the usual synchronization constructs, Pillar includes transactional memory. Its sequential features include stack walking, second-class continuations, support for precise garbage collection, tail calls, and seamless integration of Pillar and legacy code. This paper describes the design and implementation of the Pillar software stack, including the language, compiler, runtime, and high-level converters (that translate high-level language programs into Pillar programs). It also reports on early experience with three high-level languages that target Pillar. for these languages will have strong similarities. Pillar factors out these similarities and provides a single set of components to ease the implementation and optimization of a compiler and its runtime for any parallel language. The core idea of Pillar is to define a low-level language and runtime that can be used to express the sequential and concurrency features of higher-level parallel languages. The Pillar infrastructure consists of three main components: the Pillar language, a Pillar compiler, and the Pillar runtime.
The working set size of Java applications on embedded systems has recently been increasing, causing the Translation Lookaside Buffer (TLB) to become a serious performance bottleneck. From a thorough analysis of the SPECjvm98 benchmark suite executing on a commodity embedded system, we find TLB misses attribute from 24% to 50% of the total execution time. We explore and evaluate a wide spectrum of TLB-enhancing techniques with different combinations of software/hardware approaches, namely superpage for reducing TLB miss rates, two-level TLB and TLB prefetching for reducing both TLB miss rates and TLB miss latency, and even a no-TLB design for removing TLB overhead completely. We adapt and then in a novel way extend these approaches to fit the design space of embedded systems executing Java code. We compare these approaches, discussing their performance behavior, software/hardware complexity and constraints, especially the design implications for the application, runtime and OS.We first conclude that even with the aggressive approaches presented, there remains a performance bottleneck with the TLB. Second, in addition to facing very different design considerations and constraints for embedded systems, proven hardware techniques, such as TLB prefetching have different performance implications. Third, software based solutions, no-TLB design and superpaging, appear to be more effective in improving Java application performance on embedded systems. Finally, beyond performance, these approaches have their respective pros and cons; it is left to the system designer to make the appropriate engineering tradeoff.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.