Kambiz Samadi scite author profile

Double patterning lithography seems to be a prominent choice for 32nm and 22nm technologies. Double patterning lithography techniques require additional masks for a single interconnect layer. Consequently, mask shift-induced overlay errors introduce additional variability into interconnect coupling capacitances. An important open question is whether overlay-induced performance impacts are more significant than performance variations caused by variability in interconnects. We provide TCAD as well as chip-level analyses to determine whether overlay error should receive more attention than interconnect variations during interconnect manufacturing. We develop conclusions to help determine which component should be given more importance in specific double patterning process variants.

show abstract

ORION 2.0: A Power-Area Simulator for Interconnection Networks

Kahng

Peh

et al. 2012

IEEE Trans. VLSI Syst.

226

View full text Add to dashboard Cite

Abstract-As industry moves towards multi-core chips, networks-on-chip (NoCs) are emerging as the scalable fabric for interconnecting the cores. With power now the first-order design constraint, early-stage estimation of NoC power has become crucially important. In this work, we present ORION 2.0, an enhanced NoC power and area simulator, which offers significant accuracy improvement relative to its predecessor, ORION 1.0 [18].Index Terms-Network-on-chip, architectural-level modeling, design space exploration. I. INTRODUCTIONNetwork power has become increasingly substantial in multi-core designs, with the increasing demand for network bandwidth. This requires designers to accurately estimate onchip network power consumption. Power estimation can be carried out at different levels of abstraction that trade off estimation time versus accuracy, ranging from real-chip power measurements [5], to pre-and post-layout transistor-level simulations [23], to RTL power estimation tools [25] to earlystage architectural power models [4], [19], [18], [9]. Low-level power estimation tools, even RTL power estimation, require complete RTL code to be available, and simulate slowly, on the order of hours, while evaluation of an architectural power model takes on the order of seconds.Architectural power estimation is important to (1) verify that power budgets are approximately met by the different parts of the design and the entire design, and (2) evaluate the effect of high-level optimizations, which have more significant impact on power than low-level optimizations [9]. Patel et al.[16] proposed a power model for interconnection networks based on transistor count. As the model is not instantiated with architectural parameters, it cannot be used to explore tradeoffs in router microarchitecture design. Bona et al.[3] gave a methodology for automatically generating the energy models for on-chip communication infrastructure at system level; however, the focus is on bus-based and crossbar-based communication for SoC. Bhat et al. [2] proposed an architecturelevel regression analysis model for different router components based on energy numbers obtained from simulations using Magma [25] tools. ORION 1.0, a set of architectural power models for onchip interconnection routers, was proposed in [18] and has been widely used for early-stage NoC power estimation in literature and industry. However, for the Intel 80-core Teraflops chip [10] there is up to 8X difference between ORION 1.0 estimations (per component) and silicon measurements. Also, the estimated total power is about 10X less than actual. Indeed, ORION 1.0 does not include clock and link power models, which are major components of NoC power.In addition, since architectural design space exploration is typically done for current and future technologies, models must be derivable from standard technology files (e.g., Liberty [23], LEF [22]), as well as extrapolatable process models such as PTM [24] or ITRS [21], whereas ORION 1.0 collects

show abstract

SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks

et al. 2018

View full text Add to dashboard Cite

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Yazdanbakhsh

Samadi

Kim³

et al. 2018

View full text Add to dashboard Cite

Generative Adversarial Networks (GANs) are one of the most recent deep learning models that generate synthetic data from limited genuine datasets. GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable or prohibitively costly to collect. Although GANs are gaining prominence in various fields, there are no accelerators for these new models. In fact, GANs leverage a new operator, called transposed convolution, that exposes unique challenges for hardware acceleration. This operator first inserts zeros within the multidimensional input, then convolves a kernel over this expanded array to add information to the embedded zeros. Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed. We propose the GANAX architecture to alleviate the sources of inefficiency associated with the acceleration of GANs using conventional convolution accelerators, making the first GAN accelerator design possible. We propose a reorganization of the output computations to allocate compute rows with similar patterns of zeros to adjacent processing engines, which also avoids inconsequential multiply-adds on the zeros. This compulsory adjacency reclaims data reuse across these neighboring processing engines, which had otherwise diminished due to the inserted zeros. The reordering breaks the full SIMD execution model, which is prominent in convolution accelerators. Therefore, we propose a unified MIMD-SIMD design for GANAX that leverages repeated patterns in the computation to create distinct microprograms that execute concurrently in SIMD mode. The interleaving of MIMD and SIMD modes is performed at the granularity of single microprogrammed operation. To amortize the cost of MIMD execution, we propose a decoupling of data access from data processing in GANAX. This decoupling leads to a new design that breaks each processing engine to an access micro-engine and an execute micro-engine. The proposed architecture extends the concept of access-execute architectures to the finest granularity of computation for each individual operand. Evaluations with six GAN models shows, on average, 3.6× speedup and 3.1× energy savings over EYERISS without compromising the efficiency of conventional convolution accelerators. These benefits come with a mere ≈7.8% area increase. These results suggest that GANAX is an effective initial step that paves the way for accelerating the next generation of deep neural models.

show abstract

Design and CAD methodologies for low power gate-level monolithic 3D ICs

Panth

Samadi

et al. 2014

106

View full text Add to dashboard Cite

In a gate-level monolithic 3D IC (M3D), all the transistors in a single logic gate occupy the same tier, and gates in different tiers are connected using nano-scale monolithic inter-tier vias. This design style has the benefit of the superior power-performance quality offered by flat implementations (unlike block-level M3D), and zero total silicon area overhead compared to 2D (unlike transistor-level M3D). In this paper we develop, for the first time, a complete RTLto-GDSII design flow for gate-level M3D. Our tool flow is based on commercial tools built for 2D ICs and enhanced with our 3D-specific methodologies. We use this flow along with a 28nm PDK to build layouts for the OpenSPARC T2 core. Our simulations show that at the same performance, gate-level M3D offers 16% total power reduction with 0% area overhead compared to commercial quality 2D IC designs.

show abstract

CMP Fill Synthesis: A Survey of Recent Studies

Kahng

Samadi

2008

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Abstract-This paper presents extensions of the dynamicprogramming (DP) framework to consider buffer insertion and wire-sizing under effects of process variation. We study the effectiveness of this approach to reduce timing impact caused by chemical-mechanical planarization (CMP)-induced systematic variation and random L eff process variation in devices. We first present a quantitative study on the impact of CMP to interconnect parasitics. We then introduce a simple extension to handle CMP effects in the buffer insertion and wire sizing problem by simultaneously considering fill insertion (SBWF). We also tackle the same problem but with random L eff process variation (vSBWF) by incorporating statistical timing into the DP framework. We develop an efficient yet accurate heuristic pruning rule to approximate the computationally expensive statistical problem. Experiments under conservative assumption on process variation show that SBWF algorithm obtains 1.6% timing improvement over the variationunaware solution. Moreover, our statistical vSBWF algorithm results in 43.1% yield improvement on average. We also show that our approaches have polynomial time complexity with respect to the net-size. The proposed extensions on the DP framework is orthogonal to other power/area-constrained problems under the same framework, which has been extensively studied in the literature.

show abstract

FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks

Yazdanbakhsh

Brzozowski

Khaleghi

et al. 2018

View full text Add to dashboard Cite

Quantified Impacts of Guardband Reduction on Design Process Outcomes

Jeong

Kahng

Samadi

2008

View full text Add to dashboard Cite

A major source of patterning problems in low-k1 lithography is line-end pullback. Though geometric metrics such as CD at gate edge have served as good indicators, the ever-rising contribution of line-end extension to layout area necessitates reducing pessimism in qualifying line-end patterning. Electrically-aware metrics for line-ends can be helpful in this regard. In this work, we calculate the I on and I o f f impact of line-end taper shapes as well as line-end length. The proposed models are verified using TCAD simulation in a typical 65nm process. We observe that the device threshold voltage is a weak function of line-end pullback, and that the electrical impact of the taper can vary with overlay errors. We apply a non-uniform channel length model in addition to the proposed taper-dependent threshold voltage model to calculate ∆I on and ∆I o f f . Finally, the electrical metric for line-end printing is defined as expected change in I on or I o f f under a given overlay error distribution. We also propose a super-ellipse form to parameterize taper shapes, and then explore a large variety of taper shapes to characterize electrical impact.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kambiz Samadi

ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration

ORION 2.0: A Power-Area Simulator for Interconnection Networks

SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Design and CAD methodologies for low power gate-level monolithic 3D ICs

CMP Fill Synthesis: A Survey of Recent Studies

FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks

Quantified Impacts of Guardband Reduction on Design Process Outcomes

Contact Info

Product

Resources

About