Abstract-Multi-tiered memory systems, such as those based on Intel R Xeon Phi TM processors, are equipped with several memory tiers with different characteristics including, among others, capacity, access latency, bandwidth, energy consumption, and volatility. The proper distribution of the application data objects into the available memory layers is key to shorten the timeto-solution, but the way developers and end-users determine the most appropriate memory tier to place the application data objects has not been properly addressed to date.In this paper we present a novel methodology to build an extensible framework to automatically identify and place the application's most relevant memory objects into the Intel Xeon Phi fast on-package memory. Our proposal works on top of inproduction binaries by first exploring the application behavior and then substituting the dynamic memory allocations. This makes this proposal valuable even for end-users who do not have the possibility of modifying the application source code. We demonstrate the value of a framework based in our methodology for several relevant HPC applications using different allocation strategies to help end-users improve performance with minimal intervention. The results of our evaluation reveal that our proposal is able to identify the key objects to be promoted into fast on-package memory in order to optimize performance, leading to even surpassing hardware-based solutions.
With larger and larger systems being constantly deployed,\ud
trace-based performance analysis of parallel\ud
applications has become a daunting task. Even if\ud
the amount of performance data gathered per single\ud
process is small, traces rapidly become unmanageable\ud
when merging together the information collected\ud
from all processes.\ud
In general, an e cient analysis of such a large volume\ud
of data is subject to a previous ltering step that\ud
directs the analyst's attention towards what is meaningful\ud
to understand the observed application behavior.\ud
Furthermore, the iterative nature of most scienti\ud
c applications usually ends up producing repetitive\ud
information. Discarding irrelevant data aims at reducing\ud
both the size of traces, and the time required\ud
to perform the analysis and deliver results.\ud
In this paper, we present an on-line analysis framework\ud
that relies on clustering techniques to intelligently\ud
select the most relevant information to understand\ud
how does the application behave, while keeping\ud
the trace volume at a reasonable size.Peer ReviewedPostprint (published version
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.