Quantitative characterization of the software layer of a HW/SW co-designed processor

Cano, José; Kumar, Rakesh; Brankovic, Aleksandar; Pavlou, Demos; Stavrouz, Kyriakos; Gibert, Enric; Martínez, Alejandro Rosillo; González, Antonio

doi:10.1109/iiswc.2016.7581274

Cited by 2 publications

(4 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The TOL overhead varies from almost 0% to 80% (secondary Y-axis). This behavior is expected due to various application characteristics, such as number of indirect branches and the variance in the dynamic / static instruction ratio [59].…”

Section: Tol Behavior At the Global Levelmentioning

confidence: 99%

“…this work, we also wrote technical report [59] about the workload characterization for HW/SW co-designed processors. It includes basic facts about DARCO and characterizes modern benchmarks suites (SPEC2006, Media Bench, Physics Bench).…”

Section: Relationship To Previously Published Workmentioning

confidence: 99%

“…The superblock is built when the edges between basic blocks in the superblock are highly biased. Experimentally, it has been shown that the optimal threshold values are: T H BB =5 and T H SB =10000, as it is shown in [59]. Superblocks are created by including basic blocks until the probability to reach the end of the superblock is below 80% based on profiling information.…”

Section: Transparent Optimization Software Layer (Tol)mentioning

confidence: 99%

“…The statistics that are used for this work are the statistics that are known from the literature to affect the behavior of the TOL the most. They trigger long operations in terms of host instructions [59] and the ones that are studied are listed as follows:…”

Section: Tol Predictabilitymentioning

confidence: 99%

See 3 more Smart Citations

Performance simulation methodologies for hardware/software co-designed processors

Brankovic¹

View full text Add to dashboard Cite

Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex designs. Unlike other solutions, they reduce the power and the complexity doing so called dynamic binary translation and optimization from a guest ISA to an internal host custom ISA. This thesis tries to answer the question on how to simulate this kind of architectures. For any kind of processor's architecture, the simulation is the common practice, because it is impossible to build several versions of hardware in order to try all alternatives. The simulation of HW/SW co-designed processors has a big issue in comparison with the simulation of traditional HW-only architectures. First of all, open source tools do not exist. Therefore researches many times assume that the software layer overhead, which is in charge for dynamic binary translation and optimization, is constant or ignored. In this thesis we show that such an assumption is not valid and that can lead to very inaccurate results. Therefore including the software layer in the simulation is a must. On the other side, the simulation is very slow in comparison to native execution, so the community spent a big effort on delivering accurate results in a reasonable amount of time. Therefore it is the common practice for HW-only processors that only parts of application stream, which are called samples, are simulated. Samples usually correspond to different phases in the application stream and usually they are no longer than a few million of instructions. In order to archive accurate starting state of each sample, microarchitectural structures are warmed-up for a few million instructions prior to samples instructions. Unfortunately, such a methodology cannot be directly applied for HW/SW co-designed processors. The warm-up for HW/SW co-designed processors needs to be 3-4 orders of magnitude longer than the warm-up needed for traditional HW-only processor, because the warm-up of software layer needs to be longer than the warm-up of hardware structures. To overcome such a problem, in this thesis we propose a novel warm-up technique specialized for HW/SW co-designed processors. Our solution reduces the simulation time by at least 65X with an average error of just 0.75\%. Such a trend is visible for different software and hardware configurations. The process used to determine simulation samples cannot be applied to HW/SW co-designed processors as well, because due to the software layer, samples show more dissimilarities than in the case of HW-only processors. Therefore we propose a novel algorithm that needs 3X less number of samples to achieve similar error like the state of the art algorithms. Again, such a trend is visible for different software and hardware configurations. Els processadors co-dissenyats Hardware/Software (HW/SW co-designed processors) han estat proposats per l'acadèmia i la indústria com a solucions potencials per a fabricar processadors menys complexos i que consumeixen menys energia. A diferència d'altres alternatives, aquest tipus de processadors redueixen la complexitat i el consum d'energia aplicant traducció y optimització dinàmica de binaris des d'un repertori d'instruccions (instruction set architecture) extern cap a un repertori d'instruccions intern adaptat. Aquesta tesi intenta resoldre els reptes relacionats a la simulació d'aquest tipus d'arquitectures. La simulació és un procés comú en el disseny i desenvolupament de processadors ja que permet explorar diverses alternatives sense haver de fabricar el hardware per a cadascuna d'elles. La simulació de processadors co-dissenyats Hardware/Software és un procés més complex que la simulació de processadores tradicionals, purament hardware. Per exemple, no existeixen eines de simulació disponibles per a la comunitat. Per tant, els investigadors acostumen a assumir que la capa de software, que s'encarrega de la traducció i optimització de les aplicacions, no té un pes específic i, per tant, uns costos computacionals baixos o constants en el millor dels casos. En aquesta tesis demostrem que aquestes premisses són incorrectes i que els resultats amb aquestes acostumen a ser molt imprecisos. Una primera conclusió d'aquesta tesi doncs és que la simulació de la capa software és totalment necessària. A més a més, degut a que els processos de simulació són lents, s'han proposat tècniques de simulació que intenten obtenir resultats precisos en el menor temps possible. Una pràctica habitual és la simulació només de parts de les aplicacions, anomenades mostres, en el disseny de processadors convencionals, purament hardware. Aquestes mostres corresponen a diferents fases de les aplicacions i acostumen a ser de pocs milions d'instruccions. Per tal d'aconseguir un estat microarquitectònic acurat per a cadascuna de les mostres, s'acostumen a estressar aquestes estructures microarquitectòniques del simulador abans de començar a extreure resultats, procés anomenat "escalfament" (warm-up). Desafortunadament, aquesta metodologia no pot ser aplicada a processadors co-dissenyats Hardware/Software. L'"escalfament" de les estructures internes del simulador en el disseny de processadores co-dissenyats Hardware/Software són 3-4 ordres de magnitud més gran que el mateix procés d' "escalfament" en simulacions de processadors convencionals, ja que en els primers cal "escalfar" també les estructures i l'estat de la capa software. En aquesta tesi proposem tècniques de simulació basades en l' "escalfament" de les estructures que redueixen el temps de simulació en 65X amb un error mig del 0,75%. Aquests resultats són extrapolables a diferents configuracions del hardware i de la capa software. Finalment, les tècniques convencionals de selecció de mostres d'aplicacions a simular no són aplicables tampoc a la simulació de processadors co-dissenyats Hardware/Software degut a que les mostres es comporten de manera molt diferent quan es té en compte la capa software. En aquesta tesi, proposem un nou algorisme que redueix 3X el nombre de mostres a simular comparat amb els algorismes tradicionals per a processadors convencionals per a obtenir un error similar. Aquests resultats també són extrapolables a diferents configuracions de hardware i de software. En conclusió, en aquesta tesi es respon al repte de com simular processadors co-dissenyats Hardware/Software, que són una alternativa al disseny tradicional de processadors. Hem demostrat que cal simular la capa software i s'han proposat noves tècniques i algorismes eficients d' "escalfament" i selecció de mostres que són tolerants a diferents configuracions

show abstract

Section: Tol Behavior At the Global Levelmentioning

confidence: 99%

Section: Relationship To Previously Published Workmentioning

confidence: 99%

Section: Transparent Optimization Software Layer (Tol)mentioning

confidence: 99%

Section: Tol Predictabilitymentioning

confidence: 99%

See 2 more Smart Citations

Performance simulation methodologies for hardware/software co-designed processors

Brankovic¹

View full text Add to dashboard Cite

show abstract

HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation

Kumar

Cano

Brankovicy³

et al. 2017

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Self Cite

View full text Add to dashboard Cite

Improving single thread performance is a key challenge in modern microprocessors especially because the traditional approach of increasing clock frequency and deep pipelining cannot be pushed further due to power constraints. Therefore, researchers have been looking at unconventional architectures to boost single thread performance without running into the power wall. HW/SW co-designed processors like Nvidia Denver, are emerging as a promising alternative. However, HW/SW co-designed processors need to address some key challenges such as startup delay, providing high performance with simple hardware, translation/optimization overhead, etc. before they can become mainstream. A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure. Unfortunately, there is no such infrastructure available today. Building the aforementioned infrastructure itself poses significant challenges as it encompasses the complexities of not only an architectural framework but also of a compilation one. This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures. Furthermore, the paper presents DARCO, a simulation infrastructure to enable research in this domain.Peer ReviewedPostprint (author's final draft

show abstract

Quantitative characterization of the software layer of a HW/SW co-designed processor

Cited by 2 publications

References 33 publications

Performance simulation methodologies for hardware/software co-designed processors

Performance simulation methodologies for hardware/software co-designed processors

HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation

Contact Info

Product

Resources

About