In this paper, we investigate the power implications of tile size selection for tile-based processors. We refer to this inves tigation as a tile granularity study. This is accomplished by distilling the architectural cost of tiles with different compu tational widths into a system metric we call the Granularity Indicator (GI). The GI is then compared against the com munications exposed when algorithms are partitioned across multiple tiles. Through this comparison, the tile granularity that best fits a given set of algorithms can be determined, reducing the system power for that set of algorithms. When the GI analysis is applied to the Synchroscalar tile architec ture [1], we find that Synchroscalar's already low power con sumption can be further reduced by 14% when customized for execution of the 802.11a reciever. In addition, the GI can also be a used to evaluate tile size when considering multiple applications simultaneously, providing a convenient platform for hardware-software co-design.
We present Synchroscalar, a tile-based architecture forembedded processing that is designed to provide the flexibilityof DSPs while approaching the power efficiency ofASICs. We achieve this goal by providing high parallelismand voltage scaling while minimizing control and communicationcosts. Specifically, Synchroscalar uses columnsof processor tiles organized into statically-assignedfrequency-voltage domains to minimize power consumption.Furthermore, while columns use SIMD control to minimizeoverhead, data-dependent computations can besupported by extremely flexible statically-scheduled communicationbetween columns.We provide a detailed evaluation of Synchroscalar includingSPICE simulation, wire and device models, synthesisof key components, cycle-level simulation, andcompiler- and hand-optimized signal processing applications.We find that the goal of meeting, not exceeding, performancetargets with data-parallel applications leads todesigns that depart significantly from our intuitions derivedfrom general-purpose microprocessor design. Inparticular, synchronous design and substantial global interconnectare desirable in the low-frequency, low-powerdomain. This global interconnect supports parallelizationand reduces processor idle time, which are critical to energyefficient implementations of high bandwidth signalprocessing. Overall, Synchroscalar provides programmabilitywhile achieving power efficiencies within 8-30X ofknown ASIC implementations, which is 10-60X better thanconventional DSPs. In addition, frequency-voltage scalingin Synchroscalar provides between 3-32% power savingsin our application suite.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.