With chip temperature being a major hurdle in microprocessor design, techniques to recover the performance loss due to thermal emergency mechanisms are crucial in order to sustain performance growth. Many techniques for power reduction in the past and some on thermal management more recently have contributed to alleviate this problem. Probably the most important thermal control technique is dynamic voltage and frequency scaling (DVS) which allows for almost cubic reduction in power with worst-case performance penalty only linear. So far, DVS techniques for temperature control have been studied at the chip level. Finer grain DVS is feasible if a Globally-Asynchronous Locally-Synchronous (GALS) design style is employed. GALS, also known as Multiple-Clock Domain (MCD), allows for an independent voltage and frequency control for each one of the clock domains that are part of the chip. There are several studies on DVS for GALS that aim to improve energy and power efficiency but not temperature. This paper proposes and analyses the usage of DVS at the domain level to control temperature in a clustered MCD microarchitecture with the goal of improving the performance of applications that do not meet the thermal constraints imposed by the designers.
KEY WORDS: Multiple clock domain architectures, GALS, DTM, dynamic frequency and voltage scaling
INTRODUCTIONPower directly translates into heat which must be removed from the processor die in order to keep the silicon temperature inside a "safe" range. Power density is increasing due to the fact that frequency and leakage current are scaling up so much that their effect on power cannot be offset by decreasing the supply voltage. Such trend makes the cost of the cooling system grow and challenges the performance benefits that can be obtained by the ever growing transistor density. This results in a cooling system cost in the order of $1-$3 or more per Watt when the average power exceeds 40W [1][2], which represents a significant part of the total cost of the chip. This is especially important for data centers where air conditioning is a main contributor in the total cost [3]. In addition, circuit reliability depends exponentially on operating temperature. Temperature variations account for over 50% of electronic failures [4].