Stochastic hill climbing algorithm is adapted to rapidly find the appropriate start node in the application mapping of networkbased many-core systems. Due to highly dynamic and unpredictable workload of such systems, an agile run-time task allocation scheme is required. The scheme is desired to map the tasks of an incoming application at run-time onto an optimum contiguous area of the available nodes. Contiguous and unfragmented area mapping is to settle the communicating tasks in close proximity. Hence, the power dissipation, the congestion between different applications, and the latency of the system will be significantly reduced. To find an optimum region, we first propose an approximate model that quickly estimates the available area around a given node. Then the stochastic hill climbing algorithm is used as a search heuristic to find a node that has the required number of available nodes around it. Presented agile climber takes the steps using an adapted version of hill climbing algorithm named Smart Hill Climbing, SHiC, which takes the runtime status of the system into account. Finally, the application mapping is performed starting from the selected first node. Experiments show significant gain in the mapping contiguousness which results in better network latency and power dissipation, compared to state-of-the-art works.
Aggressive technology scaling triggers novel challenges to the design of multi-/many-core systems, such as limited power budget and increased reliability issues. Today's many-core systems employ dynamic power management and runtime mapping strategies trying to offer optimal performance while fulfilling power constraints. On the other hand, due to the reliability challenges, online testing techniques are becoming a necessity in current and near future technologies. However, state-of-the-art techniques are not aware of the other power/ performance requirements. This paper proposes a power-aware non-intrusive online testing approach for many-core systems. The approach schedules software based self-test routines on the various cores during their idle periods, while honoring the power budget and limiting delays in the workload execution. A test criticality metric, based on a device aging model, is used to select cores to be tested at a time. Moreover, power and reliability issues related to the testing at different voltage and frequency levels are also handled. Extensive experimental results reveal that the proposed approach can i) efficiently test the cores within the available power budget causing a negligible performance penalty, ii) adapt the test frequency to the current cores' aging status, and iii) cover available voltage and frequency levels during the testing.
In Networks-on-chip, increasing the depth of routers' buffers even by a few stages can have a significant effect on average latency and saturation threshold of the network. However, the price to pay could be high in terms of power and silicon area. In this paper, we propose a low power, high throughput asynchronous FIFO suitable for buffers of GALS NoC routers. We consistently compare the performance with regards to power, area and throughput of our FIFO with some different FIFO structures, by exploring their design trade-offs with various number of stages and for different data lengths. These structures are simulated in 90nm CMOS technology with accurate spice simulations, where results show a low power consumption and latency, with a higher throughput. Finally, a back-annotated HDL model of a 4x4 mesh network, wherein a fully asynchronous router is implemented, shows better average latency, saturation threshold and power tradeoffs, using the proposed FIFO.
Synchronization issues such as metastability in multi-clock domain systems have become a big problem, reducing data transmission throughput between domains. In this paper, a high-throughput, metastability-free data transmission channel based on pausible clock method in Globally-Asynchronous Locally-Synchronous (GALS) systems is proposed. This channel can be used as the interconnection of mixed-clock synchronous IP cores without having concerns about their synchronization. We show that the probability of metastability in our design is practically zero; and this without loss of throughput and latency, allowing the transmitter and receiver to operate with their own maximum clock frequency. The proposed channel is simulated in 90nm CMOS process using Predictive Technology Model (PTM) library. Gate delays and power parameters are extracted from Spice simulations and are back annotated into our channel HDL code. The throughput, latency and power are analyzed and compared with existing designs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.