Dynamic binary translation (DBT) can provide security, virtualization, resource management and other desirable services to embedded systems. Although DBT has many benefits, its run-time performance overhead can be relatively high. The run-time overhead is important in embedded systems due to their slow processor clock speeds, simple microarchitectures, and small caches. This paper addresses how to implement efficient DBT for ARM-based embedded systems, taking into account instruction set and cache/TLB nuances. We develop several techniques that reduce DBT overhead for the ARM. Our techniques focus on cache and TLB behavior. We tested the techniques on an ARM-based embedded device and found that DBT overhead was reduced by 54% in comparison to a general-purpose DBT configuration that is known to perform well, thus further enabling DBT for a wide range of purposes.
Dynamic binary translation (DBT) can provide security, virtualization, resource management and other desirable services to embedded systems. Although DBT has many benefits, its run-time performance overhead can be relatively high. The run-time overhead is important in embedded systems due to their slow processor clock speeds, simple microarchitectures, and small caches. This paper addresses how to implement efficient DBT for ARM-based embedded systems, taking into account instruction set and cache/TLB nuances. We develop several techniques that reduce DBT overhead for the ARM. Our techniques focus on cache and TLB behavior. We tested the techniques on an ARM-based embedded device and found that DBT overhead was reduced by 54% in comparison to a general-purpose DBT configuration that is known to perform well, thus further enabling DBT for a wide range of purposes.
Multithreaded applications can simultaneously execute on a chip multiprocessor computer, starting and stopping without warning or pattern. The behavior of each program can be different, interacting in unexpected ways, including causing competition for CPU cycles, which harms performance.To maximize program performance in this type of dynamic execution environment, the interactions among applications must be controlled. These interactions can be controlled by carefully choosing the number of threads for each multithreaded application (i.e., a system configuration). To choose a configuration, we advocate using program utility models to predict application behavior. Only a system that is capable of predicting and analyzing performance under multiple configurations can choose the best configuration and thus robustly meet its performance goals. In this paper, we present such a system.Our approach first gathers profile data. The profile data is used by multiple linear regression to build a utility model. The model takes into account program scalability, susceptibility to interference, and any inherent leveling off of performance as a program's thread count is increased. A utility model is constructed for each application. When the system workload changes, the utility models are consulted to find the new configuration that maximizes system performance while meeting each program's quality of service goals. We use multithreaded applications from PARSEC to evaluate our approach. Compared to the best traditional policy, which does not consider variances in the dynamic workload, our approach simultaneously improves system throughput by 19.3% while meeting user performance constraints 28% more often.
Autonomic multicore systems dynamically adapt themselves in response to run-time conditions and information for a variety of purposes, such as fault tolerance, power conservation, and performance balancing. Multiple application processes must coordinate their efforts and share resources to achieve system goals. In this paper, we present our inflate/deflate programming model for building autonomic processes and systems. The inflate/deflate programming model provides application-specific knowledge and reactions to a central resource coordinator. The central resource coordinator distributes and revokes resources at runtime to achieve a system goal. We discuss the overall design and challenges involved in our model. We test our design for adaptable programs by modifying programs from the PARSEC benchmark suite. The programs are tested in two sample situations to explore the difficulties of modification and the rewards gained. We find that the first modified program (blackscholes) fairly shares CPU time with other system workloads in an energy conservation scenario (up to 50% more efficient than an unmodified blackscholes). The second modified program (dedup) dynamically takes advantage of core resources as they become available (17% faster performance). If no new cores become available, it is able to more efficiently use existing resources (9% faster performance).
Modern scientific and server programs require multisocket, multicore machines to achieve good performance. Maximizing the performance of these programs requires careful consideration of program behavior and careful management of hardware resources. In particular, a program's affinity can have a critical performance effect. For such machines, there are many possible affinities for a multithreaded program. In this paper, we present AutoFinity, a solution to automatically generate program affinity policies that consider program behavior and the target machine. The policies are constructed with machine learning and used online to select an affinity. We implemented AutoFinity on a 4-processor, 48-core machine and evaluated it on 18 multithreaded programs with varying thread counts. Our results show that in 12 out of 15 cases where affinity impacts runtime, the policy generated by AutoFinity chose affinities that improved performance versus assignments that do not consider program and machine behavior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.