GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly. Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement. We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature. With the predicted optimal number of active cores, we show that we can save up to 22.09% of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.
Inflammation in the central nervous system (CNS) and disruption of its immune privilege are major contributors to the pathogenesis of multiple sclerosis (MS) and of its rodent counterpart, experimental autoimmune encephalomyelitis (EAE). We have previously identified developmental endothelial locus-1 (Del-1) as an endogenous anti-inflammatory factor, which inhibits integrin-dependent leukocyte adhesion. Here we show that Del-1 contributes to the immune privilege status of the CNS. Intriguingly, Del-1 expression decreased in chronic active MS lesions and in the inflamed CNS in the course of EAE. Del-1-deficiency was associated with increased EAE severity, accompanied by increased demyelination and axonal loss. As compared to control mice, Del-1−/− mice displayed enhanced disruption of the blood brain barrier and increased infiltration of neutrophil granulocytes in the spinal cord in the course of EAE, accompanied by elevated levels of inflammatory cytokines, including IL-17. The augmented levels of IL-17 in Del-1-deficiency derived predominantly from infiltrated CD8+ T cells. Increased EAE severity and neutrophil infiltration due to Del-1-deficiency was reversed in mice lacking both Del-1 and IL-17-receptor, indicating a crucial role for the IL-17/neutrophil inflammatory axis in EAE pathogenesis in Del-1−/− mice. Strikingly, systemic administration of Del-1-Fc ameliorated clinical relapse in relapsing-remitting EAE. Therefore, Del-1 is an endogenous homeostatic factor in the CNS protecting from neuroinflammation and demyelination. Our findings provide mechanistic underpinnings for the previous implication of Del-1 as a candidate MS susceptibility gene and suggest that Del-1-centered therapeutic approaches may be beneficial in neuroinflammatory and demyelinating disorders.
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more difficult. Current approaches rely on programmers to tune their applications by exploiting the design space exhaustively without fully understanding the performance characteristics of their applications.To provide insights into the performance bottlenecks of parallel applications on GPU architectures, we propose a simple analytical model that estimates the execution time of massively parallel programs. The key component of our model is estimating the number of parallel memory requests (we call this the memory warp parallelism) by considering the number of running threads and memory bandwidth. Based on the degree of memory warp parallelism, the model estimates the cost of memory requests, thereby estimating the overall execution time of a program. Comparisons between the outcome of the model and the actual execution time in several GPUs show that the geometric mean of absolute error of our model on micro-benchmarks is 5.4% and on GPU computing applications is 13.3%. All the applications are written in the CUDA programming language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.