In this article, we propose an application-specific demand paging mechanism for low-end embedded systems that have flash memory as secondary storage. These systems are not equipped with virtual memory. A small memory space called an execution buffer is used to page the code of an application. An application-specific page manager manages the buffer. The page manager is automatically generated by a compiler post-pass optimizer and combined with the application image. The post-pass optimizer analyzes the executable image and transforms function call/return instructions into calls to the page manager. As a result, each function in the code can be loaded into the memory on demand at runtime. To minimize the overhead incurred by the demand paging technique, code clustering algorithms are also presented. We evaluate our techniques with ten embedded applications, and our approach can reduce the code memory size by on average 39.5% with less than 10% performance degradation and on average 14% more energy consumption. Our demand paging technique provides embedded system designers with a trade-off control mechanism between the cost, performance, and energy efficiency in designing embedded systems. Embedded system designers can choose the code memory size depending on their cost, energy, and performance requirements.
There is an increasing interest in explicitly managed memory hierarchies, where a hierarchy of distinct memories is exposed to the programmer and managed explicitly in software. These hierarchies can be found in typical embedded systems and an emerging class of multicore architectures. To run an application that requires more code memory than the available higher-level memory, typically an overlay structure is needed. The overlay structure is generated manually by the programmer or automatically by a specialized linker. Manual code overlaying requires the programmer to deeply understand the program structure for maximum memory savings as well as minimum performance degradation. Although the linker can automatically generate the code overlay structure, its memory savings are limited and it even brings significant performance degradation because traditional techniques do not consider the program context. In this article, we propose an automatic code overlay generation technique that overcomes the limitations of traditional automatic code overlaying techniques. We are dealing with a system context that imposes two distinct constraints: (1) no hardware support for address translation and (2) a spatially and temporally coarse grained faulting mechanism at the function level. Our approach addresses those two constraints as efficiently as possible. Our technique statically computes the Worst-Case Number of Conflict misses (WCNC) between two different code segments using path expressions. Then, it constructs a static temporal relationship graph with the WCNCs and emits an overlay structure for a given higher-level memory size. We also propose an inter-procedural partial redundancy elimination technique that minimizes redundant code copying caused by the generated overlay structure. Experimental results show that our approach is promising.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.