K. Ochii scite author profile

To move High-Performance Computing (HPC) closer to forward operating environments and missions, the Army Research Laboratory is developing approaches using hybrid, asymmetric core computing. By blending capabilities found in Graphics Processing Units (GPUs) and traditional von Neumann multicore Central Processing Units (CPUs), approaches are being developed and optimized to provide at or near real-time processing speeds for research project applications. Algorithms are designed to partition work to resources best designed to handle the processing load. The use of commodity resources allows the design to be flexible throughout the life cycle without the costly and time-consuming delays associated with Application-Specific Integrated Circuit (ASIC) development. This paradigm allows for rapid technology transfer to end users. In this paper, we describe a synchronous impulse reconstruction radar imaging algorithm that has been designed for hybrid CPU-GPU processing. We discuss various optimizations such as asynchronous task partitioning between the CPU and GPU as well as data movement reduction. We also discuss analysis and design of the algorithms within the context of two programming models: NVIDIA's CUDA and AMD's ATI Brook+. Finally, we report on the speedup achieved by this approach that allowed us to take a code once restricted to postprocessing and transform it into one that exceeds realtime performance requirements.

show abstract

An 8 ns 1 Mb ECL BiCMOS SRAM

Matsui

Momose

Urakawa

et al.

View full text Add to dashboard Cite

A l M b x 1 ECL SRAM fabricated with a 0.8pm BiClLlOS technology has 8ns access time and is 10K IjO compatible. To achieve sub-l0ns address access time and low-power consumption, an ECL-CMOS level converter, a bit-line peripheral circuit and an automatic power saving function are employed.The chip architecture is shown in Figure 1. Inputs are received by an ECL input buffer and translated to CMOS levels, and address decoding is executed. The cell array consists of 512 rows by 2048 columns, and is divided into 16 sections. Each section has 128 columns and four local amplifiers, allowing for conversion to a 4b-wide configuration. To reduce both the word-line delay and the active power, a modulated double word-line structure was adopted'. Only one section is activated at a time by a section word-line (SWL) which is selected using NOR gate by a main word-line (MWL) and one of the four section selection lines. This structure can relax the pitch of the main word-line driver and can also relax bipolar transistor size. The polysilicon section word-line is connected to the aluminum section word-line every 16 cells. The total word-line delay is less than Ins. The 4b-wide global data are multiplexed and output by an ECL buffer. Figure 2 shows the ECL input buffer and ECL-CMOS level converter. The output of this buffer is directly converted to CMOS level without ECL predecoding to reduce power consumption. The converter consists of an NMOS dual cross-coupled-latch and two PMOS FETs and the reference voltage of Vbb-Vbe-Vtp is applied to PMOS gates for detecting input-buffer output levels. The complementary outputs Ai*, Ai* can be available simultaneously because of the symmetrical geometry of the converter. Thus, the converter is suitable for address buffer. The output of the converter supplies CMOS levels with no dc current.A BiCMOS bit-line peripheral circuit, illustrated in Figure 3, is used to minimize srnsc delay. The hit-lint. voltage s\ting is limited to about 5OmV h i a norm all^ -on bit-line equalization circuit.where bit-line equalizing transistors are normally activated during a read cycle. Thus, bit-line recovery time during data switching is reduced. The access time advantage of using the normally-on bitline equalization circuit is about 30%. During write operation the equalization transistors are cut-off. The bit-line voltage of Vcc-2Vbe is generated by a Darlington transistor'. A PMOS load is inserted 'Sakurai. T . , et. al., "A Low Power 46ns 256Kbit CMOS Static RAM with Dynamic Double Word Line". IEEE between bit-line voltage source and bit-line pairs. A two-stage sensing circuit, with bipolar differential pair, is used.An automatic power saving (APS) function utdizing an address transition detectioir . (ATD) technique is applied to the ECL SRAM in order to reduce power consumption during read cycle' '. The cell arm)-and first sense amplifiers are activated b!signal,@ApS which is generated from ~h c AIYI' pulse. arid is used onl) for activation, not for equilibration. k'igure .b shows a circuit diagram of a s...

show abstract

Consideration of poly-Si loaded cell capacity limits for low-power and high-speed SRAMs

Kato

Sato

Matsui

et al. 1992

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

K. Ochii

TFT (thin film transistor) cell technology for 4 Mbit and more high density SRAMs

SRAM cell stability under the influence of parasitic resistances and data holding voltage as a stability prober

A 1 mu A retention 4 Mb SRAM with a thin-film-transistor load cell

An 8 ns 1 Mb ECL BiCMOS SRAM

Consideration of poly-Si loaded cell capacity limits for low-power and high-speed SRAMs

Contact Info

Product

Resources

About