International audienceAs complexity of embedded system increases, configurable hardware is becoming more attractive because it provides a fast and efficient basis for design development. As a consequence, one of the most promising embedded architecture consists in the replication of Processing Elements (PEs)connected through a Network-on-Chip (NoC). Such architectures provide computation parallelism, scalability, and reduced design time thanks to reusability. This paper describes the development of a scalable, distributed memory, open-source NoC-based platform called Open-Scale and itsimplementation into FPGA devices. The main objective of this platform is to provide a complete framework for research development on NoC-based distributed memory MPSoCs
This paper presents a Multi-Processor System-on-Chip platform which is capable of load balancing at run-time. The system is purely distributed in the sense that each processor is capable of making decisions on its own, without having relying by any central unit. All the management is ensured by a very tiny preemptive RTOS (run-time operating system) running on every processor which is mainly responsible for running and distributing tasks among the processing elements (PEs). The goal of such strategy is to improve the performance of the system while ensuring scalability of the design. In order to validate the concepts, we have conducted some experiments with a widely used multimedia application: the MJPEG (Motion JPEG) decoder. Obtained results show that the overhead caused by the task migration mechanism is amortized by the gain in term of performance.
Scalability and programmability are important issues in large homogeneous MPSoCs. Such architectures often rely on explicit message-passing among processors, each of which possessing a local private memory. This paper presents a low-overhead hardware/software distributed shared memory approach that makes such architectures multithreading-capable. The proposed solution is implemented into an open-source message-passing MPSoC through developing a POSIX-like thread API, which shows excellent scalability using application kernels used for benchmarking in shared-memory systems. This approach efficiently draws strengths from the on-chip distributed private memory that opens the way to exposing the multithreading programmability/capabilities of that component as a generalpurpose accelerator.
The growing concerns of power efficiency, silicon reliability and performance scalability motivate research in the area of adaptive embedded systems, i.e. systems endowed with decisional capacity, capable of online decision making so as to meet certain performance criteria. The scope of possible adaptation strategies is subject to the targeted architecture specifics, and may range from simple scenario-driven frequency/voltage scaling to rather complex heuristic-driven algorithm selection. This paper advocates the design of distributed memory homogeneous multiprocessor systems as a suitable template for best exploiting adaptation features, thereby tackling the aforementioned challenges. The proposed solution lies in the combined use of a typical application processor for global orchestration along with such an adaptive multiprocessor core for the handling of data-intensive computation. This paper describes an exploratory homogeneous multiprocessor template designed from the ground up for scalability and adaptation. The proposed contributions aim at increasing architecture efficiency through smart distributed control of architectural parameters such as frequency, and enhanced techniques for load balancing such as task migration and dynamic multithreading.
Adaptive multiprocessor systems are appearing as a promising solution for dealing with complex and unpredictable scenarios. Given the large variety of possible use cases that these platforms must support and the resulting workload variability, offline approaches are no longer sufficient because they do not allow coping with time changing workloads. This letter presents a novel approach based on the utilization of PI and PID controllers, widely used in control automation, for optimizing resources utilization in Multiprocessor System-on-Chip (MPSoC). Several architecture characteristics such as response time during frequency changing, noise and perturbations are modeled and validated in a high-level model and results are compared to information obtained on a homogeneous MPSoC platform prototype. Power and energy consumption figures are discussed and two controllers are proposed: 1) PI-; and 2) PID-based controllers. Results show the system capability of adapting under disturbing conditions while ensuring application performance constraints and reducing energy consumption.
In this paper we propose a strategy for better exploiting Multi-Processor Systems-on-Chip resources utilization by means of using a control-loop feedback mechanism. We apply the proposed techniques in a purely distributed memory MPSoC architecture that is composed of a frequency scaling module responsible for tuning the frequency of processors at run-time. Results show very promising in terms of adaptation capabilities for system with dynamic workload. Performance results demonstrate the effectiveness of the proposed approach when workload requirements for applications may vary, affecting the overall performance of the system. For validating the proposed approach we have implemented a multi-thread MJPEG decoder application and created an architecture model with/without perturbations in the system.
Message-passing is an increasingly popular design style for MPSoCs that usually results in systems that perform better compared to external shared-memory designs performance and power-wise, this because of much decreased data transfers with external memory. This scheme relies on explicit communications between processing tasks that participate in the application. Contrarily to shared-memory multiprocessors, tasks usually get assigned to processors at design-time. In order to cope with transient performance losses originating from various phenomena such as increased processing workload or peak traffic in the communication subsystem, various adaptation mechanisms based on task migration have been proposed in the literature. As Message-passing systems usually use PE-private memory architecture, these mechanisms imply migrating application code from processor to processor, which incurs penalty in performance and power consumption. This paper proposes a local shared-memory strategy in which processors execute code hosted in a remote processor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.