Safeen Huda scite author profile

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware accelerator search framework that defines a broad optimization environment covering key design decisions within the hardware-software stack, including hardware datapath, software scheduling, and compiler passes such as operation fusion and tensor padding. In this paper, we analyze bottlenecks in stateof-the-art vision and natural language processing (NLP) models, including EfficientNet [91] and BERT [19], and use FAST to design accelerators capable of addressing these bottlenecks. FAST-generated accelerators optimized for single workloads improve Perf/TDP by 3.7× on average across all benchmarks compared to TPU-v3. A FASTgenerated accelerator optimized for serving a suite of workloads improves Perf/TDP by 2.4× on average compared to TPU-v3. Our return on investment analysis shows that FAST-generated accelerators can potentially be practical for moderate-sized datacenter deployments. CCS CONCEPTS• Hardware → Electronic design automation; • Computer systems organization → Parallel architectures.

show abstract

A Novel STT-MRAM Cell With Disturbance-Free Read Operation

Huda

Sheikholeslami

2013

IEEE Trans. Circuits Syst. I

View full text Add to dashboard Cite

Clock gating architectures for FPGA power reduction

Huda

Mallick

Anderson

2009

View full text Add to dashboard Cite

Clock gating is a power reduction technique that has been used successfully in the custom ASIC domain. Clock and logic signal power are saved by temporarily disabling the clock signal on registers whose outputs do not affect circuit outputs. We consider and evaluate FPGA clock network architectures with built-in clock gating capability and describe a flexible placement algorithm that can operate with various gating granularities (various sizes of device regions containing clock loads that can be gated together). Results show that depending on the clock gating architecture and the fraction of time clock signals are enabled, clock power can be reduced by over 50%, and results suggest that a fine granularity gating architecture yields significant power benefits.

show abstract

A Survey on Circuit Modeling of Spin-Transfer-Torque Magnetic Tunnel Junctions

Vatankhahghadim

Huda

Sheikholeslami

2014

IEEE Trans. Circuits Syst. I

View full text Add to dashboard Cite

Accurate modeling of magnetic tunnel junction (MTJ) is critical for design of memories such as spin-transfertorque magnetoresistive random access memory (STT-MRAM) and spin logic circuits such as spin flip flops. This paper reviews several static and dynamic models for the MTJ and compares them for their capabilities and limitations. Furthermore, a Verilog-A model is developed to predict dynamic characteristics of the MTJ. These models are used in simulating a prototype circuit to illustrate their strengths and weaknesses.Index Terms-Magnetic tunnel junction (MTJ), magnetoresistive random-access memory (MRAM), modeling, spin-transfer-torque (STT).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Safeen Huda

Negative-resistance read and write schemes for STT-MRAM in 0.13µm CMOS

A full-stack search technique for domain optimized deep learning accelerators

A Novel STT-MRAM Cell With Disturbance-Free Read Operation

Clock gating architectures for FPGA power reduction

A Survey on Circuit Modeling of Spin-Transfer-Torque Magnetic Tunnel Junctions

Contact Info

Product

Resources

About

Safeen Huda

Negative-resistance read and write schemes for STT-MRAM in 0.13&#x00B5;m CMOS

A full-stack search technique for domain optimized deep learning accelerators

A Novel STT-MRAM Cell With Disturbance-Free Read Operation

Clock gating architectures for FPGA power reduction

A Survey on Circuit Modeling of Spin-Transfer-Torque Magnetic Tunnel Junctions

Contact Info

Product

Resources

About

Negative-resistance read and write schemes for STT-MRAM in 0.13µm CMOS