Min-Wuk Lee scite author profile

SoC design is composed of two major parts; the design of computing cores and their communication architecture. As the die sizes and the number of subsystems on a chip increase, power consumed by the interconnection structures, including clocks, takes significant portion of the overall power-budget. This calls for techniques to reduce the energy consumed in on-chip communication while satisfying quality of service (QoS) requirements such as bandwidth, latency, or reliability. Recently, on-chip networks (OCN) have been studied actively to address these communication problems [1][2][3][4], but their implementations are not energy-efficient so far [1] [3]. In this paper, we report successful implementation of a 51mW 1.6GHz hierarchical star-connected on-chip network supporting 11.2GB/s bandwidth with various low-power circuit techniques.The star-topology guarantees constant and minimum switch hop counts between every communicating IP. However, 1-level flat star-topology [1] as shown in Fig. 8.2.1a results in a number of capacitive global wires that may cause long latency and large power dissipation. Figure 8.2.1b shows a hierarchical star-connected SoC which is composed of several clusters of tightly-connected IPs for their communication locality. Intra-cluster local links provide high-bandwidth with shorter latency and less energy consumption, and inter-cluster global links show higher link utilization by link-sharing. Figure 8.2.2 shows the OCN-based SoC platform applicable to low-power mobile devices [5]. The OCN has two separate networks: forward networks and backward networks that configure the Master-to-Slave path and Slave-to-Master path, respectively [1]. To reduce the area of OCN, 100MHz packets are serialized by Up-Sampler with a 1.6GHz network clock before transmission and then deserialized by Down-Sampler upon arrival. To deserialize a packet without a globally synchronized clock, a strobe signal is transmitted together with the packet. The strobe and the packet experience the same wire-delay without skew. A forward network packet consists of 32b address, 32b data, and 16b header fields while a backward one does not have the address field. The packet header generated by a network interface contains routing information, a type of burst length, a read/write command, an acknowledgment request, and a QoS level.The global link connecting clusters in the 2nd level star-topology is usually several millimeters long. By using overdrivers [6], clocked sense-amplifiers and twisted differential signaling, packets are transmitted reliably with less than 600mV swing. The sizes of a tranceiver and the overdrive voltage are chosen to obtain a 200mV separation at the receiver end as shown in Fig. 8.2.3. A 5mm global link of 1.6µm wire-pitch can carry a packet at 1.6GHz with 320ps wire-delay and consumes 35pJ/packet (= 0.35pJ/bit). In contrary, a full-swing link consumes up to 3x more power and additional area of repeaters.A crossbar switch for intra-cluster packets performs buffer-less cut-through switching to minimize ...

show abstract

Development of a 3-D graphics rendering engine with lighting acceleration for handheld multimedia systems

Nam

Lee

Yoo

2005

IEEE Trans. Consumer Electron.

View full text Add to dashboard Cite

A low-power three-dimensional (3-D) graphics rendering engine with lighting acceleration is designed and implemented for handheld multimedia terminals. The lighting unit is hardware implemented and integrated into the chip for the low-power acceleration of the 3D graphics applications. We adopt the following three steps to handle the memory bandwidth problem for rendering operations. I) We find bilinear MIPMAP is the best texture filtering algorithm for handheld systems based on our developed energy-efficiency metric. With this observation, we adopt bilinear MIPMAP for our texture filtering unit, which requires only 50% of texture memory bandwidth compared with trilinear MIPMAP filtering, II) We put the depth test operation into the earlier stage of the graphics pipeline, which eliminates texture memory accesses for invisible pixels, III) We develop a power-efficient small cache system as the interface to rendering memory. The accelerator takes 181K gates and the performance reaches 20Mpixels/s. A test chip is implemented with 1-poly 6-metal 0,18um CMOS technology. It operates at the frequency of 20MHz with 14.7mW power consumption 1 .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Min-Wuk Lee

A 0.7-fJ/bit/search 2.2-ns search time hybrid-type TCAM architecture

A 155-mW 50-m vertices/s graphics processor with fixed-point programmable vertex shader for mobile applications

A 50Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications

A 51mW 1.6GHz on-chip network for low-power heterogeneous SoC platform

Development of a 3-D graphics rendering engine with lighting acceleration for handheld multimedia systems

Contact Info

Product

Resources

About