Many-core processor based systems gain popularity in high-performance parallel embedded applications. Estimating memory bandwidth requirement, i.e. external memory bandwidth, given various cache size for target parallel applications requires a prohibitively large simulation time. In this work, we propose an analytical model to quickly estimate the memory bandwidth for a given cache size and help exploring trade-offs between cache sizes and memory bandwidth requirement. We model the stochastic behavior of cache misses for a single cache as a random process. Using central limit theorems for identically or non-identically distributed random processes, we accurately estimate the collective cache misses from hundreds of processor cores and thus the total memory bandwidth requirement for the whole system. The results show that our model improves a speed of simulation time up to 200.4 times for 200 cores whereas its estimated results achieve less than 0.01% difference from the simulated ones for 200 cores in terms of accuracy.
Abstract-NAND flash memory (NFM) based storage devices, e.g. Solid State Drive (SSD), are rapidly replacing conventional storage devices, e.g. Hard Disk Drive (HDD). As NAND flash memory technology advances, its specification has evolved to support denser cells and larger pages and blocks. However, efforts to fully understand their impacts on design objectives such as performance, power, and cost for various applications are often neglected. Our research shows this recent trend can adversely affect the design objectives depending on the characteristics of applications. Past works mostly focused on improving the specific design objectives of NFM based systems via various architectural solutions when the specification of NFM is given. Several other works attempted to model and characterize NFM but did not access the system-level impacts of individual parameters. To the best of our knowledge, this paper is the first work that considers the specification of NFM as the design parameters of NAND flash storage devices (NFSDs) and analyzes the characteristics of various synthesized and real traces and their interaction with design parameters. Our research shows that optimizing design parameters depends heavily on the characteristics of applications. The main contribution of this research is to understand the effects of low-level specifications of NFM, e.g. cell type, page size, and block size, on system-level metrics such as performance, cost, and power consumption in various applications with different characteristics, e.g. request length, update ratios, read-and-modify ratios. Experimental results show that the optimized page and block size can achieve up to 15 times better performance than the conventional NFM configuration in various applications. The results can be used to optimize the system-level objectives of a system with specific applications, e.g. embedded systems with NFM chips, or predict the future direction of NFM.
Abstract:A packet memory controller in routers accesses the packet memory according to the QoS requirements of packets. The previous QoS-aware controller using a feedback control loop degenerates into round robin scheduling under temporary overload and suffers from slow response. We propose a new packet memory controller that estimates input load accurately and rapidly and schedules different classes using a flexible bitmap scheduler. The results show that under temporary overload or rapidly changing input loads, it can successfully meet the latency requirements by showing only less than 2% difference from the requirement of the high priority class.
SUMMARYA packet memory stores packets in internet routers and it requires typically RT T × C for the buffer space, e.g. several GBytes, where RT T is an average round-trip time of a TCP flow and C is the bandwidth of the router's output link. It is implemented with DRAM parts which are accessed in parallel to achieve required bandwidth. They consume significant power in a router whose scalability is heavily limited by power and heat problems. Previous work shows the packet memory size can be reduced to, where N is the number of long-lived TCP flows. In this paper, we propose a novel packet memory architecture which splits the packet memory into on-chip and off-chip packet memories. We also propose a low-power packet mapping method for this architecture by estimating the latency of packets and mapping packets with small latencies to the on-chip memory. The experimental results show that our proposed architecture and mapping method reduce the dynamic power consumption of the off-chip memory by as much as 94.1% with only 50% of the packet buffer size suggested by the previous work in realistic scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.